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ABSTRACT 

We  consider  the  one-step  prediction  problem  for  discrete-time 
linear  systems  in  correlated  Gaussian  white  plant  and  observation 
noises,  and  non-Gaussian  initial  conditions.  Explicit  representa¬ 
tions  are  obtained  for  the  MMSE  and  LLSE  (or  Kalman)  estimates 
of  the  state  given  past  observations,  as  well  as  for  the  expected 
square  of  their  difference.  These  formulae  are  obtained  with  the 
help  of  the  Girsanov  transformation  for  Gaussian  white  noise  se¬ 
quences,  and  explicitly  display  the  effects  of  the  distribution  of  the 
initial  condition.  With  the  help  of  these  formulae,  we  investigate 
the  large-time  asymptotics  of  £*,  the  expected  squared  difference 
between  the  MMSE  and  LLSE  estimates  at  time  t.  We  character¬ 
ize  the  limit  of  the  error  sequence  {et,  t  =  1,2,...}  and  obtain 
some  related  rates  of  convergence.  A  complete  large-time  analysis 
is  provided  for  the  scalar  case. 
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I.  INTRODUCTION 


In  his  seminal  paper  of  1960,  Kalman  [13]  developed  a  method  of  es¬ 
timating  the  state  of  a  noisy  linear  dynamical  system  based  upon  linear 
observations  corrupted  by  additive  Gaussian  white  noise.  The  import  of 
Kalman’s  state-space  approach  was  that  it  provided  a  dynamical  and  re¬ 
cursive,  and  hence  computable  description  of  the  estimator,  thereby  over¬ 
coming,  for  many  practical  problems,  the  restrictive  stationarity  assump¬ 
tions  of  the  Wiener-Hopf-Kolmogorov  theory  of  linear  filtering.  Kalman’s 
state-space  approach  renewed  intense  interest  in  filtering  theory,  eventu¬ 
ally  leading  to  a  clearer  understanding  of  the  general  problem  of  filtering 
a  nonlinear  dynamical  plant  given  nonlinear  observations.  Advances  in 
nonlinear  filtering  theory  have,  in  turn,  motivated  fundamental  and  far- 
reaching  breakthroughs  in  a  wide  range  of  other  probabilistic  questions, 
from  stochastic  control  theory  [8]  to  martingale  theory  [8,  12]  to  stochastic 
partial  differential  equations  [24-25]. 

We  shall  revisit,  in  this  chapter,  Kalman’s  problem.  His  model,  as  we 
shall  understand  it  here,  is  that  of  an  Revalued  plant  process  evolving 
according  to  the  stochastic  discrete-time  linear  dynamical  equations 

A0°  =  A°+1  =  AtX?  +  W?+l.  t  =  0,1,...  (1.1a) 

This  plant  process  describes  the  evolution  of  some  quantity  of  interest — the 
so-called  system  state,  e.g.,  the  amount  of  a  quantity  in  a  chemical  reaction 
or  the  position  and  velocity  of  an  orbiting  satellite.  Unfortunately,  full  state 
information  is  often  not  available  and  we  can  only  measure  a  sequence  of 
Revalued  observations  which  are  given  by  the  linear  equations 

Yt  =  HtX°  +  V°+1.  t  =  0,1,...  (1.1b) 

To  be  rigorous,  we  denote  by  (12,  IF,  P°)  an  underlying  probability  triple  on 
which  all  random  variables  (rvs)  are  defined.  Of  course,  the  matrices  At 
and  Ht  are  respectively  of  size  n  x  n  and  k  x  n  for  each  t  —  0,1,....  The 
statistics  of  the  random  noise  processes  W°  =  {W°+1;  t  =  0,1,...}  and 
V°  =  {V°+x\  t  =  0, 1, . . .}  are  governed  by  the  following  assumptions: 

(A.l):  The  process  (W°,V°)  is  a  zero-mean  Gaussian  white  noise 
(GWN)  sequence  with  covariance  structure  £  =  {£t+r,  t  — 
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0,1,...}  given  by 


i.e.,  the  Rn+fc-valued  rvs  {(Wf+l5  Vf+i)?  t  =  0,1,...}  are  mutu¬ 
ally  independent  zero-mean  Gaussian  rvs  with  covariances  (1.2); 
and 

(A. 2):  For  all  t  —  1,2,...,  the  covariance  matrix  E*  is  positive  definite 
(and  thus  invertible). 

The  reader  is  referred  to  [7,  27]  for  background  material  on  GWN  sequences. 
In  the  classical  Kalman  filtering  model,  the  statistics  of  the  initial  condition 
£  are  assumed  to  be  governed  by 

(K):  The  initial  condition  £  is  a  Gaussian  rv  with  mean  p  and  covari¬ 
ance  A,  and  is  independent  of  the  process  (W°,  V°). 

We  note  that  there  is  an  analogous  continuous-time  formulation  of  (1.1a)- 
(1.1b)  using  Ito  equations  [8,  12,  14,  17].  However,  we  have  elected  here 
to  study  the  discrete-time  model  in  order  to  minimize  technicalities  and 
since,  in  applications,  at  most  a  finite  number  of  observations  are  usually 
recorded.  We  also  note,  in  passing,  that  the  superscript  li:n  on  the  plant 
X°  =  {X°;  t  =  0,1,...},  noises  (W°,V°)  =  {(W°+1, V?+1);  t  =  0,1,...} 
and  measure  P°  indicates  that  these  are  ‘original’  components  of  the  model, 
to  be  distinguished  from  auxiliary  plant  and  noise  processes  and  probability 
measures  which  we  define  in  the  course  of  the  analysis. 

In  [13],  Kalman  then  posed  the  problem  of  estimating  in  the  minimum- 
mean-square-error  sense  the  state  X°+1  of  the  plant  given  the  observations 

Yo,Y\, . .  .,Yt,  for  each  t  =  0,1, _  In  particular,  he  set  out  to  compute 

the  conditional  means 

Xt+1  :=  E°[X°+1|Tt]  *  =  0,1,...  (1.3) 

where  the  cr— field  yt  is  defined  by 

yt:=a{Y0,Yu...,Yt}.  t  =  0,1,... 

In  the  process  of  doing  so,  he  also  solved  the  generalized  one-step  prediction 
problem ,  which  is  defined  as  the  finding  of  the  conditional  law  of  X°+l  given 
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the  <T-field  yt.  We  may  alternately  formulate  this  latter  question  as  the 
simultaneous  evaluation  of  the  conditional  expectations 

E°[¥>(Xt°+1)|^]  *  =  0,1,...  (1.4) 

for  all  bounded  Borel  mappings  <p  :  Rn  — ►  C,  with  C  denoting  the 
set  of  complex  numbers.  Under  assumptions  (A.l),  (A. 2)  and  (K), 
the  linearity  of  (l.la)-(l.lb)  implies  that  for  each  t  —  0,1,...,  the  rvs 
(X°+1,  lo,  Y\ , . . . ,  Yt}  are  jointly  Gaussian  and  therefore  X°+1  is  condition¬ 
ally  Gaussian  given  yt  [2,  Sec.  2.2].  The  generalized  one-step  prediction 
problem  is  then  solved  once  two  sequences  of  finite-dimensional  sufficient 
statistics  are  known,  namely  the  conditional  means  of  (1.3)  and  the  condi¬ 
tional  covariances 

Pt+ 1  :=  E°[(X°+1  -  Xf+1)'(X°+1  -  Xi+l)|34],  t  =  0,1,...  (1.5) 

with  '  denoting  transpose.  Kalman’s  breakthrough  lies  in  showing  that  the 
processes  X  =  {Xt+i;  t  =  0, 1, . . .}  and  P  =  {A+i!  t  —  0, 1, . . .}  can  be 
described  by  dynamical  recursions  [2,  pp.  38-39].  These  recursions  are 
given  by  the  following  coupled  system 

X,f  =  /x 

x&  =  Atx{<+1  -  [. AtPtKH't  +  Ej&] [HtP?H't  +  S^r1^  -  HX?) 

<  =  0,1,...  (1.6) 


and 

P0K  =  A, 

Pill  =  AtPtKA't  +  sr+1  *  =  0,1,...  (1.7) 

-  [AtPtKH't  +  +  S ^-'[AtPfHl  + 

Under  assumption  (K),  we  have  the  following  identities 

Xt+i  =  X«+1  and  Pt+ 1  =  P*x.  *  =  0, 1, . . .  (1.8) 

It  will  shortly  become  apparent  why  we  have  separately  stated  the  defini¬ 
tions  (1.3),  (1.5)  and  the  recursions  (1.6)— (1.7). 

The  goal  of  this  paper  is  a  modest  one — to  relax  assumption  (K), 
replacing  it  with  the  more  realistic  assumption  (A. 3)  given  by 
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(A. 3):  The  initial  condition  £  has  distribution  F  with  finite  first  and 
second  moments  /i  and  A,  respectively,  and  is  independent  of  the 
process  (W°,V°).  No  other  a  priori  assumptions,  save  these  on 
the  first  two  moments,  are  enforced  on  F. 

We  shall  investigate  in  various  ways  how  replacing  assumption  (K)  by 
assumption  (A. 3)  affects  the  solution  of  the  prediction  problem.  The  dis¬ 
cussion  below  constitutes  a  synthesis  of  the  material  which  has  appeared 
in  the  three  papers  [28-30].  We  hope  that  this  account  will  provide  a  valu¬ 
able  complement  to  classical  Kalman  filtering,  as  the  initial  condition  is  in 
practice  a  rather  vaguely-defined  object  about  which  only  first  and  second 
moments  are  known. 

The  effect  of  replacing  the  classical  assumption  by  (A. 3)  is  dra¬ 
matic;  the  rvs  {X°+1,  Yo,  Yi, . . .,  Y*}  are  no  longer  jointly  Gaussian  for  each 
t  =  0, 1, . . .,  and  thus  we  cannot  a  priori  expect  that  the  conditional  law 
of  the  the  state  given  the  observations  can  be  described  by  any  finite  col¬ 
lection  of  sufficient  statistics,  e.g.,  X  and  P  as  in  (1.3)  and  (1.5).  We  also 
note  that  in  this  more  general  case  (1.8)  no  longer  holds;  the  conditional 
means  and  covariances  of  (1.3)  and  (1.5)  no  longer  propagate  according 
to  (1-6)  and  (1.7).  Faced  with  this  state  of  affairs,  we  might  naturally 
seek  to  directly  describe  the  evolution  of  the  conditional  law  of  the  state 
given  the  observations,  say  by  a  straightforward  use  of  Bayes’  rule  or  via 
the  discrete-time  analogue  of  the  celebrated  Zakai  equation  of  nonlinear 
filtering  [7,  32].  This  is  not  an  easy  task,  however,  for  studying  the  evo¬ 
lution  of  this  conditional  law  is  tantamount  to  studying  the  evolution  of 
an  infinite-dimensional  sufficient  statistic.  Moreover,  under  the  suggested 
approaches,  it  seems  quite  difficult,  if  not  impossible,  to  clearly  follow,  over 
time,  the  precise  influence  of  the  initial  distribution.  The  key  to  overcoming 
these  difficulties  lies  in  the  techniques  of  [22],  where  the  filter  is  factored 
into  a  collection  of  finite-dimensional  and  computable  sufficient  statistics 
and  a  functional — the  information  of  the  observations  is  contained  solely  in 
these  statistics,  while  the  initial  distribution  appears  only  in  the  structure 
of  the  functional.  These  sufficient  statistics  obey  recursions  derived  from 
filtering  an  auxiliary  system  of  the  type  (1.1a)— (1.1b)  under  Kalman’s  orig¬ 
inal  assumptions  (A.l)-(A.2)  and  (K).  This  provides  us  with  a  pleasing 
reaffirmation  of  the  centrality  of  Kalman’s  results. 
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Aware  of  the  demands  of  real  applications,  we  shall  not  content  our¬ 
selves  only  with  a  solution  of  the  generalized  one-step  prediction  problem, 
but  shall  address  also  the  more  practical  question  of  state  estimation  (as 
in  (1.3)).  This  will  be  of  more  direct  interest  to  the  engineer,  who  in  gen¬ 
eral  is  not  concerned  with  the  entire  conditional  law,  but  rather  with  some 
appropriate  estimate  of  the  true  state  of  the  plant.  For  each  t  =  0,1,..., 
the  estimation  of  the  state  X°+1  given  the  observation  cr— field  yt  is  often 
defined  as  the  problem  of  finding  a  Borel  measurable  <p  :  (Rfc)<+1  — >  1" 
which  minimizes  the  mean-square  error 

e°[!!x,°+1  (i-9) 

over  some  allowable  class  of  Borel  measurable  functions.  If  we  minimize 
(1.9)  over  all  Borel  measurable  mappings  tp,  we  get  the  minimum  mean 
square  error  (or  MMSE)  estimate,  while  if  we  minimize  (1.9)  only  over  all 
affine  mappings  ip,  we  get  the  linear  least  square  error  (or  LLSE)  estimate. 
In  fact,  the  MMSE  and  LLSE  estimators  are  objects  which  have  already 
been  introduced:  It  is  well  known  indeed  that  the  MMSE  estimators  co¬ 
incide  with  the  sequence  (1.3)  of  conditional  means  [2,  Thm.  2.3.1],  and 
that  the  LLSE  estimator  propagates  according  to  (1.6)— (1.7),  with  PK  be¬ 
ing  the  sequence  of  corresponding  error  covariances  [2,  Sec.  5.4],  As  we 
remarked  in  (1.8),  under  the  Gaussian  assumption  (K),  Xt+1  =  for 
all  t  =  0, 1, . . .  and  the  MMSE  and  LLSE  estimators  coincide.  But  as  soon 
as  we  pass  to  assumption  (A. 3),  the  minimization  of  (1.9)  over  all  Borel 
measurable  mappings  <p  is  not  the  same  as  the  minimization  of  (1.9)  over 
all  affine  mappings  <p,  so  that  in  general  the  MMSE  and  LLSE  estimators 
will  not  agree.  The  difference  between  the  MMSE  and  LLSE  estimators  is 
a  direct  consequence  of  having  a  non-Gaussian  initial  condition. 

We  shall  in  this  paper  not  only  provide  computable  expressions  for  the 
MMSE  and  LLSE  estimators,  but  also  study  properties  of  their  difference. 
We  are  directed  to  this  study  for  two  reasons.  Firstly,  this  difference,  as 
we  mentioned  above,  is  a  direct  consequence  of  relaxing  (K)  to  (A. 3). 
An  understanding  of  this  difference  might  be  useful  to  the  engineer,  who, 
due  to  computing  restrictions,  often  constructs  the  LLSE  estimator  as  an 
approximation  of  the  more  accurate  MMSE  estimator.  We  shall  study  the 
MMSE-LLSE  difference  by  considering  the  mean-square  error 

et  :=E°[||Xt-  XfH2].  <  =  1,2,...  (1.10) 
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After  deriving  a  formula  for  (1.10),  we  shall  then  proceed  to  an  asymptotic 
analysis  of  the  sequence  £  =  { st ;  t  =  1,2, . . .}  under  the  classical  assump¬ 
tion  of  time-homogeneity  of  the  plant  and  observation  dynamics,  and  noise 
correlation  structure.  We  shall  in  particular  be  interested  in  situations  in 
which  the  LLSE  estimates  are  asymptotically  the  same  as  the  MMSE  es¬ 
timates,  i.e.,  when  limt  £t  =  0.  In  these  cases,  we  shall  also  give  a  more 
refined  analysis  of  the  rate  of  this  convergence.  These  cases  provide  a  formal 
justification  of  the  idea  widely  held  by  practitioners  that  short  of  first  and 
second  moment  information,  precise  distributional  assumptions  on  the  ini¬ 
tial  condition  can  be  dispensed  with  when  estimating  the  plant  process  on 
the  basis  of  the  observations.  Under  the  assumption  of  time-homogeneity 
enforced  in  this  asymptotic  analysis,  we  may  write 

et  =  et((A,H,2),F),  *  =  1,2,...  (1.11) 

where  A,  H  and  E  are  respectively  the  time-invariant  state  and  observation 
gain  matrices  and  noise  correlation  structure.  For  each  /  =  0,1,...,  this 
representation  displays  the  dependence  of  et  on  the  system  triple  (A,  H,  E) 
and  on  the  initial  distribution  F,  thus  providing  a  natural  organization  of 
our  study  of  the  asymptotics  of  e.  We  hope  that  as  a  consequence  of  this 
analysis,  there  will  emerge  a  much  more  precise  understanding  of  the  effects 
of  the  initial  condition  on  the  filtering  of  (l.la)-(l.lb). 

The  reader  familiar  with  more  recent  developments  in  filtering  the¬ 
ory  will  already  be  acquainted  with  some  of  the  tools  used  here,  namely 
the  Girsanov  measure  transformation  and  the  Kallianpur-Striebel  formula: 
The  former  will  be  used  to  define  a  new  probability  measure  under  which 
explicit  calculations  can  be  performed,  while  the  latter  will  be  invoked  to 
relate  the  conditional  expectations  of  (1.4)  to  corresponding  conditional 
expectations  under  the  new  measure.  In  essence,  the  arguments  in  this 
paper  amount  to  pushing  the  nonlinear  effects  of  the  non-Gaussian  initial 
condition  into  the  probability  measure.  To  do  this,  we  take  as  a  pattern  the 
techniques  of  [22]  which  solve  the  filtering  problem  in  the  corresponding 
continuous-time  case  when  the  plant  and  observation  noises  are  uncorre¬ 
lated.  It  will  be  of  some  interest,  in  fact,  to  see  how  the  calculations  of 
[22]  can  be  generalized  to  handle  correlated  plant  and  observation  noises. 
A  pleasing  discovery  awaits  us  in  that  the  structure  of  the  solution  of  the 
prediction  problem  with  correlated  noise  is  essentially  the  same  as  that  for 
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uncorrelated  noise.  The  only  difference  lies  in  the  dynamics  of  a  collection 
of  finite-dimensional  sufficient  statistics,  while  the  functional  dependence  of 
the  predictor  upon  these  statistics  remains  the  same  as  in  the  uncorrelated 
case. 

The  complete  organization  of  this  paper  is  as  follows.  Some  notational 
conventions  are  collected  in  Section  II  for  easy  reference.  Section  III  pro¬ 
vides  a  review  of  the  discrete-time  Girsanov  transform.  In  Section  IV,  we 
carry  over  to  the  discrete-time  context  the  arguments  developed  in  [22] 
for  handling  the  continuous-time  problem  when  the  plant  and  observation 
noises  are  uncorrelated  and  the  observation  noise  sequence  V°  is  standard. 
We  then  show  in  Section  V  how  to  modify  these  ideas  in  order  to  solve 
the  prediction  problem  in  the  case  of  correlated  noises.  In  Section  VI  we 
use  these  results  to  obtain  computable  expressions  for  both  the  MMSE  and 
LLSE  estimators  X  and  XK.  We  also  apply  the  machinery  developed  thus 
far  to  give  a  formula  for  the  error  process  e  of  (1.10) — this  formula  is  pre¬ 
sented  in  Theorem  6.4.  Section  VII  is  devoted  to  a  careful  study  of  the 
asymptotics  of  the  expression  of  Theorem  6.4,  yielding  our  most  general 
results  about  the  asymptotics  of  e  in  the  multivariable  case.  Section  VIII, 
a  relatively  short  section,  discusses  a  key  technical  result  dealing  with  a 
partial  converse  for  the  asymptotics  of  e.  We  close  with  Section  IX,  which 
contains  an  even  more  complete  asymptotic  analysis  in  the  scalar  case  (i.e., 
when  n  =  k  =  1),  when  many  of  the  expressions  of  Sections  V-VII  can  be 
simplified. 

Several  authors  have  considered  various  prediction,  estimation  and 
filtering  problems  for  (l.la)-(l.lb)  under  assumptions  (A.1)-(A.3);  the 
continuous-time  filtering  problem  has  been  studied  in  [3,  11,  19,  22].  Vari¬ 
ations  to  the  basic  discrete-time  model  with  a  class  of  non-Gaussian  white 
noises  have  been  discussed  in  [20-21]  with  applications  to  failure  detection. 
Related  studies  of  the  evolution  of  the  conditional  law  of  the  plant  given 
the  observations  as  a  measure-valued  Markov  process  are  given  in  [15-16]. 

We  would  like  to  point  out  that  in  this  chapter  we  have  studied  only  the 
one-step  prediction  problem.  Of  course,  other  estimation  problems  could 
have  been  considered,  namely,  the  the  so-called  filtering  and  interpolation 
problems,  which  are  respectively  the  problems  of  estimating  the  states  X° 
and  X°,  s  —  0, 1, . . . ,  f  —  1,  on  the  basis  of  yt  for  each  t  —  0,1, - These 
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problems  can  be  addressed  with  methods  similar  to  those  presented  here. 
We  have  restricted  our  attention  to  the  one-step  prediction  problem  mainly 
for  calculational  convenience  and  since  it  is  the  natural  analogue  of  the 
continuous-time  filtering  calculations  [19,  22]. 

To  the  best  of  the  authors’  knowledge,  few  results  have  been  reported  in 
the  literature  on  the  large-time  asymptotics  of  £  for  a  general  non-Gaussian 
initial  distribution.  This  may  be  due  to  the  fact  that  the  key  representation 
result  (Theorem  6.4)  has  been  derived  only  relatively  recently  [27,  29]. 

II.  NOTATION  AND  CONVENTIONS 

For  the  sake  of  easy  reference,  we  have  collected  here  the  various  no¬ 
tation  and  conventions  used  throughout  the  paper: 

The  set  of  real  numbers  is  denoted  by  R,  and  C  stands  for  the  set 
of  complex  numbers.  Elements  of  Rn  are  viewed  as  column  vectors  and 
transposition  is  denoted  by  ',  so  that  ||u||2  =  v'v  for  every  v  in  Rn. 

For  positive  integers  n  and  m,  we  denote  the  space  of  n  x  m  real 
matrices  by  «MnXm;  let  0nXTO  denote  the  zero  element  in  Mnxm.  When 
m  —  n,  we  write  Mn  for  the  space  AfnXm  of  n  x  n  real  matrices,  and  we 
denote  by  Qn  the  cone  ofnxn  symmetric  positive  semi-definite  matrices. 
We  let  In  and  On  be  the  unit  and  zero  elements  in  A4n,  respectively. 

Elements  of  random  or  deterministic  sequences  will  be  set  in  regular 
type;  the  corresponding  boldface  character  will  denote  the  sequence  itself. 
Examples  which  we  have  already  introduced  are  the  plant  process  X°  = 
{1°;  i  =  0,1,-..}  and  the  covariance  structure  £  =  {£t;  t  =  1,2, . . .}. 

For  any  matrix  K  in  Afn,  with  sp (K)  denoting  the  set  of  all  eigenvalues 
of  K,  we  set  Amin(/sr)  :=  min{|A|  :  A  G  sp (K)}  and  Amax(/t)  :=  max{|A|  : 
A  G  sp(Ii')};  it  is  customary  to  call  Amax(/f)  the  spectral  radius  of  K  and 
to  denote  it  by  p(K).  The  mapping  Mn-*  given  by 

||/sr||„p  -  keM„  (2.i) 

defines  the  operator  norm  on  Mn  induced  by  the  Euclidean  norm  on  Rn. 
However,  since  Alji  Is  a  finite— dimensional  Banach  space,  all  norms  on  A4n 
are  equivalent  [10,  Thm.  IX. 2.1].  This  will  be  valuable  in  some  of  our 
limiting  operations  in  the  latter  parts  of  this  chapter,  as  we  may  safely  take 
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entrywise  limits.  The  following  well-known  facts  about  ||  •  \\op  will  come  in 
handy: 

\\K\\1P  =  Ama *(K'K),  K  G  Mn,  (2.2) 

and 

\min(K' K)\\v\\2  <  \\Kv\\2  <  XmUK'K)\\v\\\ 

K  G  Mn,  v  G  Rn.  (2.3) 

The  constant  mapping  Rn  — »  R  :  x  — >  1  is  denoted  by  1. 

For  each  matrix  R  in  Qn,  let  Gr  denote  the  distribution  of  a  zero-mean 
Rn-valued  Gaussian  rv  with  covariance  R. 

The  following  notation  will  be  useful  in  our  representation  result  for 
the  conditional  expectations  of  (1.4).  For  every  S  in  Qin,  let  X$  and  Bs 
denote  generic  Revalued  rvs  such  that  ( Xs,Bs )  is  a  R2n-valued  zero- 
mean  Gaussian  rv  with  covariance  matrix  S.  For  every  bounded  Borel 
mapping  :  Rn  — ►  C,  we  define  the  mappings  T<p  :  R"  x  R”  x  Q-in  —  C 
and  Uy  :  Rn  x  Rn  x  Qn  X  Mn  X  Qin  — *•  C  by 

T <p[x,  6;  5]  :=  £  [<p(x  +  Xs)  exp[6'Rs]] , 

x,b  G  Rn,  S  G  Q2n,  (2.4) 


and 


Uy[x,  6;  A,  $ ;  5]  :=  /  T<p[x  +  fPz,  z;  5]  exp 
vAe" 


b'z  -  Iz'Xz 


dF(z), 


x,b  G  Kn,  A  G  Qn,  $  €  Mn,  S  G  Q2n,  (2.5) 


with  the  understanding  that  £  denotes  integration  with  respect  to  the  Gaus¬ 
sian  distribution  of  the  rv  (Xs,  Bs)- 

We  denote  by  2?(Rn)  the  set  of  all  square-integrable  probability  distri¬ 
butions  functions  on  Rn  with  positive  definite  (and  thus  invertible)  covari¬ 
ance  matrix,  and  by  V°(Rn)  the  set  of  those  distributions  in  X>(Rn)  which 
have  zero  mean. 


III.  THE  GIRSANOV  TRANSFORMATION 

Our  efforts  of  Sections  IV  and  V,  where  we  derive  expressions  for  the 
conditional  expectations  (1.4),  will  rely  crucially  on  the  celebrated  Girsanov 
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measure  transformations  for  GWN  sequences  [7,  9].  To  streamline  the 
arguments  in  Sections  IV  and  V,  we  here  summarize  the  properties  of  the 
Girsanov  transform. 

The  essence  of  the  Girsanov  transformation  is  the  translation  of  a  Gaus¬ 
sian  process.  As  evidence  of  the  simple  ideas  at  the  heart  of  the  Girsanov 
transformation,  we  begin  with  the  fact  that  the  probability  law  of  a  Gaus¬ 
sian  rv  with  mean  vector  m  ^  0  and  invertible  covariance  matrix  R  is 
absolutely  continuous  with  respect  to  the  law  of  a  Gaussian  rv  with  zero 
mean  and  covariance  matrix  R.  It  is  easy  to  see  that  the  corresponding 
Radon-Nikodym  derivative  is  given  by 

dGn.(x  —  m )  exp  (— L(a:  —  m)'R~1(x  -  m)) 
dGfi(x)  exp  {—\x'R~lx) 

=  exp  ^ x'R~1m  —  R_1  m'j  ,  x  £  Rn; 

we  are  in  this  simple  calculation  translating  a  Gaussian  rv  by  a  constant 
m.  The  discrete-time  Girsanov  measure  transformation  is  conceptually 
very  similar,  but  since  we  are  dealing  with  processes ,  the  class  of  allowable 
translates  turns  out  to  be  much  richer. 

The  basic  framework  for  the  discrete-time  Girsanov  transformation  is 
as  follows:  The  underlying  probability  space  (ft,  IF,  P°)  is  equipped  with 
the  filtration  {Tt',  t  —  0,1,...}  of  T,  i.e.,  {Tut  ~  0,1,...}  is  an  in¬ 
creasing  family  of  sub-cr-fields  of  T.  Let  U  =  {Ut;  t  =  1,2,...}  be  an 
Revalued  zero-mean  (Tt,  P°)-GWN  sequence  with  correlation  structure 
A  =  {A t,t  =  1,2,...},  i.e.,  for  all  t  =  0,1,...,  the  rv  t/i+1  is  Tt+\- 
measurable  and 


E°  [exp  [iO1  Ut+ i]  | Tt\  =  exp 


-/At+i0 


9  £  I”,  t  =  0,1,...  (3.1) 


For  future  reference,  we  note  the  well-known  fact  that  (3.1)  more  generally 
holds  for  6  in  C" . 

A  case  of  special  interest  arises  when  in  this  definition,  the  filtration 
is  taken  to  be  the  natural  filtration  { T Y\  t  =  0,1,...}  induced  by  the 
sequence  U,  i.e., 

TV+1:=a{Us+l-,  s  =  Q,  !,...,<}  <  =  0,1,...  (3.2) 
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with  Tq  chosen  so  that  Tq  C  T^\  in  fact,  Tq  is  often  selected  to  be 
the  trivial  c-field  on  0.  With  the  choice  (3.2),  we  simply  refer  to  U  as 
a  zero-mean  P°-GWN  sequence  with  correlation  structure  A;  moreover, 
when  the  reference  probability  measure  P°  is  clear  from  the  context,  we 
further  omit  it  from  the  terminology.  This  is  in  agreement  with  usage  in 
earlier  sections,  since  (3.1)  always  implies  that  the  rvs  {Up,  t  =  1,2,...}  are 
mutually  independent.  We  also  say  that  for  any  T  >  0,  a  finite  collection 
{Up  t  =  1,2, . .  .,T+ 1}  of  rvs  is  an  (^,P°)-GWN  sequence  if  (3.1)  holds 
for  all  t  =  0, 1, . .  .,T. 

Now,  for  a  given  ^-adapted  Revalued  sequence  x  =  {x*>  *  — 
0,1,...},  we  define  the  sequences  U  =  {Up  t  =  1, 2, . . .}  and  L  =  {Lp  t  = 
0, 1, . . .}  taking  values  in  Rd  and  R,  respectively,  by 

Ut+i  •— Ut+i  —  At+iXt  t  —  0,1,...  (3.3) 


and 


L0  =  1,  Lt+i  :=  exp 


s— 0 


X3U3P  X 


*  =  0,1,. 


(3.4) 


The  first  key  fact  that  underlies  the  Girsanov  transformation  is  given  in  the 
following  Lemma: 

Lemma  3.1.  The  sequence  L  of  positive  rvs  constitutes  an  (iFt,  P0)- 
martingale,  with 

E°[Lt}  -  1.  <  =  0,1,...  (3.5) 


Proof.  Fix  <  =  0,1,....  The  rv  Lt+ 1  being  positive,  its  (conditional) 
expectations  are  well  defined,  though  not  a  priori  finite.  From  the  relation 


Lt+ i  —  Lt  •  exp 


XtUt+l  ~  2 Xt^-t+iXi 


(3.6) 


we  obtain 


E°[Lt+i|J)]  =  Lt.exp 


“gXjAt+iXt 


E°  [exp  [xt^t+i]  \Tt[ 


(3.7) 


as  both  the  rvs  Lt  and  are  ^-measurable.  Since  the  process  U  is 
a  zero-mean  (Ti,P°)-GWN  sequence  with  correlation  structure  A,  the 
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conditional  distribution  of  Ut+i  given  Tt  is  that  of  a  zero-mean  Gaussian 
rv  with  covariance  matrix  At+1.  Therefore,  we  have 


E°  [exp  [x'tUt+ 1]  \Tt]  =  exp 


(3.8) 


using  again  the  ^-measurability  of  the  rv  xt-  Substituting  (3.8)  into  (3.7) 
we  obtain  the  martingale  property  in  the  form 


E°[Zt+1|JFt]  =  Zt  P°  —  a.s. 


whence 

E  °[Xt+1]  =  E°[£t].  (3.9) 

The  equality  (3.5)  is  now  a  simple  consequence  of  (3.9)  and  of  the  fact  that 
Lq  =  1.  ■ 

For  each  fixed  integer  T  =  0, 1, . . . ,  we  now  define  a  measure  Pr+i  on 

(ft,^)  by 

Pr+i(A):=  /  LT+idP°,  A  e  T.  (3.10) 

J  A 

It  follows  from  (3.5)  in  Lemma  3.1  that  the  measure  Px+i  is  indeed  a 
probability  measure  on  T\  we  denote  by  Ej+i  the  expectation  operator 
associated  with  Px+i-  A  simple  martingale  argument  in  conjunction  with 
the  fact  that  Lq  =  1  shows  that  Pt+i  agrees  with  P°  on  Tq.  Also,  since 
Lx- |-i  is  nonzero,  we  see  that  Pt+i  is  mutually  absolutely  continuous  with 
P°  on  if;  the  Radon-Nikodym  derivatives  are  given  by 


d- Pt+i 
dP° 


—  Lx+  i 


and 


dP° 

dPj+i 


As  a  consequence  of  this  last  fact,  the  statements  P°-a.s.  and  Px+i-a.s. 
are  equivalent  and  reference  to  the  underlying  probability  measure  is  usually 
dropped.  Moreover,  for  notational  convenience  we  omit  a.s.  equivalencies 
in  the  ensuing  discussion,  as  such  omissions  have  no  effect  upon  the  final 
results. 

This  change  of  measure  which  replaces  the  base  measure  P°  by  Pt+i  is 
what  is  referred  to  in  the  literature  as  the  Girsanov  measure  transformation. 
Its  most  important  properties  are  summarized  below. 
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Theorem  3.2.  With  the  notation  and  definitions  given  above,  we  have  the 
following  facts: 

(a)  The  sequence  (Ut;  t  =  1,2, . .  .,T+  1}  is  a  zero-mean  (iFt,  Py+1)- 
GWN  process  with  covariance  structure  given  by 

ET+i[Ut+iUl+1\Ft]  =  At+i;  f  =  0,l,...,T  (3.11) 

and 

(b)  The  r vs  {L^1,  t  =  0, 1, . .  .,T  + 1}  form  an  (.7+,  Px+i)-martingaie. 
Proof.  The  rvs  {Ut\  t  =  1, 2, . . . , T  + 1}  are  clearly  ^--adapted.  Therefore 
Claim  (a)  will  be  proved  if  we  can  show  that 


Et+1  [exp  [iO'Ut+i]  \?t]  =  exp 


6  6  Rn. 


t  =  0,1, ...,T  (3.12) 

Fix  t  —  0,1,..., T  and  9  in  Rn.  Invoking  a  standard  result  on  the  eval¬ 
uation  of  conditional  expectations  under  an  absolutely  continuous  change 
of  measures  [18,  Sec.  27.4] — the  so-called  Kallianpur-Striebel  formula  of 
nonlinear  filtering — we  have 


Et+i  [exp  [ie'Ut+i]  | 


E°  [exp  [id'Ut+1}LT+i\Tt 
E°  [LT+1\Ft] 


(3.13) 


and  the  remainder  of  the  proof  consists  in  evaluating  the  numerator  and 
denominator  of  (3.13). 

By  the  martingale  property  of  Lemma  3.1,  we  first  see  that 


E°[Xt+i|^]  =  Lt.  (3.14) 


Next,  using  the  smoothing  property  of  conditional  expectations,  we  readily 
get 

E°  [exp  [iO'Ut+r]  Lt+ i|^] 

=  E°  [E°  [exp  [iO'Ut+1\  Lt+  i|^+i]  | Tt\ 

—  E°  [exp  [id1  Ut+1]-E°  [LT+1\ft+l}\Ft]  (3.15) 

=  E°  [exp  [iO'Ut+i]  Xt+il^t]  (3.16) 


=  Lt  ■  exp 
=  Lt  • exp 


-2X<^‘+lX£ 


~  i9'At+iXt 


E°  [exp  [iO'Ut+i  +  x'tUt+ 1]  | Tt]  (3.17) 

E°  [exp  [(id  +  xt)'  Ut+i]  l^t]  • 

(3.18) 
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The  passage  from  (3.15)  to  (3.16)  is  validated  by  the  martingale  prop¬ 
erty  of  L,  while  (3.17)  follows  from  (3.16)  upon  using  (3.6)  and  the  Tt~ 
measurability  of  the  rv  \t'i  substituting  (3.3)  into  (3.17)  yields  (3.18).  Fi¬ 
nally,  using  the  first  comment  that  followed  (3.1),  we  see  that 


E°  [exp  [(i9  +  xt)'  Ut+ i]  \?t]  =  exp 


iXt)'At+i(0  -  ixt) 


and  substitution  of  this  last  fact  into  (3.18)  yields 


E°  [exp  [i0'Ut+i\  LT+i\^t]  =  Lt  -  exp 


(3.19) 


We  now  obtain  (3.12)  by  inserting  (3.14)  and  (3.19)  into  (3.13). 

To  establish  Claim  (b),  we  observe  that  the  base  measure  P°  itself  can 
be  obtained  from  the  transformed  measure  Py+i  by  a  Girsanov  measure 
transformation:  Indeed,  we  rewrite  (3.3)  as 

Ut+i  =  Ut+ 1  +  At+iXt  t  —  0, 1,  •  • . 


where  it  is  now  known  by  the  first  part  of  the  proof  that  the  sequence 
{Ut\  t  =  1, 2, . . . ,  T  +  1}  is  a  zero-mean  (JFt,PT+1)-GWN  process  with 
covariance  structure  (3.11).  Therefore,  in  analogy  with  (3.4),  with  Ut+i 
and  — At+iXt  playing  the  role  of  Ut+\  and  Ai+iXt,  respectively,  for  all 
t  =  0, 1, . .  .,T,  we  define  the  sequence  L  =  {Lt’,  t  =  0, 1, . . .}  of  K-valued 
rvs  by 


Lq  —  1,  L 


i+i 


:=  JJ  exp 


s-0 


-x'sUs+i  -  2^Aa+lXs 


t  =  0,1,...  (3.20) 


By  Lemma  3.1,  the  rvs  {Z<;  t  =  0, l,...T-f  1}  form  an  (Jrt,PT+i)- 
martingale,  and  the  proof  of  Claim  (b)  is  now  completed  upon  observing 
that 


Lt  =  Lix.  t  =  0, 1, . . .  (3.21) 

■ 

We  conclude  this  section  with  an  easy  observation:  An  alternate  ex¬ 
pression  for  (3.4)  is  simply 

i 


Lt+ 1  =  exp 


53  jx'3Cs+1  -  ^x'aAs+iXs| 


«  =  0,1,...  (3.22) 
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and  similarly  from  (3.20)-(3.21),  we  get 


Lt+ 1  =  exP 


Us+ 1 


t  =  0,1,. 


Theorem  3.2  implies  that  the  probability  measures  {Pj;  T  =  1,2,...} 
obey  a  consistency  property — that  if  T  and  T'  are  finite  times  with  0  <  T  < 
T',  then  { Ut\  t  =  1,2,  ...,T+  1}  has  the  same  statistics  under  both  Pj+i 
and  Pt'+i — more  completely,  Pj+1  and  Px'+i  agree  on  Tq  V  v{Ut;  t  = 
1, 2, . . . ,  T+ 1}.  The  reader  may  then  ask  if,  by  setting  T  =  oo  in  (3.10),  we 
may  find  a  single  probability  measure  P  under  which  Claim  (a)  of  Theorem 
3.2  is  true  for  all  T.  Unfortunately,  the  existence  of  such  a  ‘projective  limit’ 
of  the  measures  {PT;T  =  1,2,...}  is  rare.  In  fact,  such  a  probability 
measure  will  exist  if  and  only  if  L  is  a  uniformly  integrable  martingale; 
the  reader  is  referred  to  [27,  Thm.  2.1  and  23,  Props.  III-l-l  and  IV- 
2-3]  for  a  more  detailed  analysis  of  this  question.  The  absence  of  such  a 
limiting  probability  measure  P  will  not,  however,  cause  any  difficulties  in 
our  efforts  here.  We  shall  be  considering  the  one-step  prediction  problem 
for  (l.la)-(l.lb)  on  finite  horizons ,  which  involve  only  finite  subsets  of  the 
rvs  {£,  (Wt°,  Vj°);  t  =  0,1,...}.  We  will  touch  upon  this  matter  again  in 
Section  V. 


IV.  THE  UNCORRELATED  CASE:  A  REVIEW 

We  first  review  the  solution  of  the  one-step  prediction  problem  (i.e., 
evaluating  the  conditional  expectations  (1.4))  in  the  simpler  case  when  the 
plant  and  observation  noise  sequences  are  uncorrelated  and  the  observa¬ 
tion  noise  sequence  is  standard.  In  other  words,  we  temporarily  replace 
assumption  (A.l)  by  assumption  (A.l)*,  where 

(A.l)*:  The  process  (W°,V°)  is  a  zero-mean  GWN  sequence  with  covari¬ 
ance  structure  S  given  by 


:=  Cov 


O  kxn 


OnXk  A 

h  )■ 


t  =  1,2,... 


The  situation  defined  by  (A.l)*  is  essentially  the  discrete— time  analogue 
of  the  one  discussed  in  [19,  22],  and  as  was  done  there,  the  arguments  will 
rely  crucially  on  the  Girsanov  measure  transformation;  this  time  of  course. 
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the  discrete-time  version,  which  was  presented  in  Section  III,  will  be  used. 
As  our  purpose  here  is  to  provide  some  motivation  and  background  for  the 
more  complicated  arguments  of  Section  V,  we  review  below  the  various 
steps  leading  to  the  relevant  discrete-time  Girsanov  transformation.  In 
doing  so,  we  are  careful  to  explicitly  point  out  the  essential  features  of  our 
reasoning. 

We  wish  to  emphasize  once  again  that  the  only  source  of  non-Gaussian 
randomness  in  the  model  comes  from  the  initial  condition  £.  If  £  were  a 
Gaussian  rv,  then  (1.8)  would  hold  and  the  generalized  one-step  prediction 
problem  would  be  fully  described  by  (1.6)-(1.7).  Furthermore,  the  MMSE 
and  LLSE  estimation  processes  X  and  XK  would  coincide  and  the  error 
process  e  would  be  identically  zero,  whence  the  asymptotic  analysis  of 
Sections  VI-IX  would  be  trivial.  Given  this,  our  guiding  principle  is  to 
effect  a  decomposition  of  (l.la)-(l.lb)  so  that,  as  much  as  possible,  we  may 
separate  the  effects  of  the  Gaussian  white  noise  sequence  (W°,V°)  from 
the  troublesome  effects  of  the  non-Gaussian  initial  condition  £. 

We  begin  by  noting  that  the  closed-form  solution  of  the  recursive  equa¬ 
tion  (1.1a)  is  simply 

t- 1 

X°t  =  *(i, 0)£  +  Y,  *(*> 5)^°+i  t  =  1,2,-.. 

s=0 

where  $  is  the  state  transition  matrix  defined  by 

4>(s,s)  =  In,  $(*  +  l,s)  =  At$(f,s),  s<t.  s,t  =  0,1,...  (4.1) 

This  suggests  the  decomposition 

X°  =  Zt+Xt  t=  1,2,...  (4.2) 

where  the  processes  Z  and  X  are  given  by 

Z0  =  <£,  zt  =  *(t,  0)£  *  =  1,2,...  (4.3) 

and 

i— 1 

*0  =  0,  *i  =  X>(M)W.0+i-  *  =  1,2,...  (4.4) 

3  =  0 

The  effects  of  the  non-Gaussian  initial  condition  £  are  encoded  in  the  pro¬ 
cess  Z,  while  X  is  a  Gaussian  sequence.  For  future  reference,  we  note  that 
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the  evolution  of  the  processes  Z  and  X  are  also  described  by  the  dynamical 
equations 

=  Zt+1  =  AtZt  t  =  0,1,...  (4.5) 

and 

X0  =  0,  Xt+1  =  AtXt  +  W°+1.  t  =  0,1,...  (4.6) 

Using  the  decomposition  (4.2),  we  can  write  (1.1b)  as 

Yt  =  HtXt  +  HtZt  +  V°+1 

=  HtXt  +  Vt+1  t  =  0,1,...  (4.7) 

where  we  have  set 

Vt+i  =  Vt°+1  +  HZt.  t  =  0,1,... 

Therefore  the  observation  process  Y  is  the  sum  of  the  P °-Gaussian  pro¬ 
cess  {IItXt',  t  —  0,1,...}  and  of  the  sequence  V  which  can  be  inter¬ 
preted  as  the  translation  of  the  P°-GWN  sequence  V°  by  the  process 
{ HtZt ;  t  =  0,1,...}.  This  simple  observation  suggests  that  after  an  ap¬ 
propriate  Girsanov  transformation  to  be  determined,  the  noise  sequence  V 
can  be  made  to  look  like  a  GWN  sequence  under  the  transformed  measure. 

Our  next  step  consists  of  using  the  Girsanov  measure  transformation 
to  see  that  the  law  of  the  translated  Gaussian  process  V  is  absolutely  con¬ 
tinuous  with  respect  to  the  law  of  a  centered  Gaussian  process;  this  will  be 
made  more  precise  in  a  moment.  As  the  end  result  of  this  measure  trans¬ 
formation,  we  can  consider  a  new  probability  measure  under  which  V  (as 
opposed  to  V°)  is  now  a  zero-mean  standard  GWN.  In  short,  the  effects  of 
the  non-Gaussian  initial  condition  have  been  pushed  into  a  Radon-Nikodym 
derivative. 

For  the  situation  at  hand,  the  processes  V°  and  { HtZt ;  t  —  0, 1, . . .} 
play  the  role  of  the  processes  U  and  X  °f  Section  III,  respectively.  To 
complete  the  preparations  for  the  Girsanov  transformation,  we  introduce 
the  filtration  {Xt\  t  =  0, 1, . . .}  by  setting 

JF0  :=  «  =  1,2,...} 

and 

Tt  :=  XQV  o{Vf;  s  =  1,2,...,*}.  t  =  1,2,... 
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Since  the  processes  W°  and  V°  are  uncorrelated  (and  thus  independent 
since  Gaussian),  V°  will  indeed  be  a  zero-mean  (fi,P°)-GWN  sequence. 
(When  considering  the  correlated  case  in  Section  V,  V°  will  not  be  a 
(Ft,  P°)-GWN  sequence  for  this  definition  of  Ff  This  will  be  the  main 
hurdle  we  shall  confront.) 

Using  (3.22),  we  see  that  the  sequence  L  of  Radon-Nikodym  derivatives 
of  (3.4)  is  here  given  by  L0  :=  1  and 
Lt+ 1  *  =  0,1,...  (4.8) 

t  t 

:=  exp  0)]  V/+1  -  0)]'[JT.$(a,  0)]f  . 

s=0  s=0 

For  any  nonnegative  integer  T,  we  then  define  the  probability  measure 
PT+i  by 

Pt+x(A):=  /  LT+1dP°,  A  €  F.  (4.9) 

J  A 

Applying  the  results  of  Section  III,  we  readily  conclude  the  following  facts: 

(G)  The  Px+i-statistics  of  the  r vs  {£,  Wr+i‘,  r  —  0, 1, ...  , Us+1;  s  = 
0, 1, . .  .,T}  are  the  same  as  the  P° -statistics  of  the  rvs  {£,  W°+1; 
r  =  0, 1, . . .  , Us°+1;  s  =  0,1,  ...,T}.  In  particular,  under  the  trans¬ 
formed  measure  Pt+i,  the  rv  £  has  distribution  F  and  is  indepen¬ 
dent  of  the  r vs  {W°+1;  r  =  0,1,...;  Vs+i;  s  =  0, 1, . .  .,T}  which 
are  zero-mean  Gaussian  rvs  with  known  covariance  structure. 

We  can  summarize  thus  far  the  effects  of  the  decomposition  (4.2)  and  of 
the  Girsanov  transformation  (4.8)-(4.9):  The  observation  process  Y  can 
be  viewed  as  the  sum  of  a  Gaussian  process  and  of  a  translated  Gaussian 
process,  this  translate  being  amenable  to  a  Girsanov  transformation  which 
results  in  property  (G). 

We  now  turn  to  the  evaluation  of  the  conditional  expectation  (1.4) 
for  some  fixed  time  index  t  =  0,1,...,  and  for  some  given  bounded  Borel 
mappings  < p  :  Kn  —>  C.  Fixing  the  time  horizon  T  so  that  t  <  T,  we 
consider  the  change  of  measure  defined  by  (4.9)  and  seek  to  evaluate  (1.4) 
by  performing  the  calculations  under  the  transformed  measure  Px+i  rather 
than  under  P°.  To  do  this,  as  in  Section  III,  we  resort  to  the  Kallianpur- 
Striebel  formula  [18,  Sec.  27.4],  which  here  takes  the  form 


F'T+ll-hT+1|j/iJ 


(4.10) 
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since 


dP° 

dPx+i 


Our  problem  thus  reduces  to  the  evaluation  of  expressions  of  the  form 


VT+iMx?+1)L?+l\yt] 

for  all  bounded  Borel  mappings  tp  :  Rn  — ►  C. 

First,  an  easy  preliminary  simplification:  The  rv  X°+l  is  Xt+  \- 
measurable  and  34  is  also  contained  in  Tt+ 1,  whence  iterated  conditioning 
and  the  martingale  property  of  Claim  (b)  in  Theorem  3.2  readily  imply  the 
equality 

ET+MX°+1)L^+1\yt]  =ET+1[ET+1[9(Xt°+1)ir+il^]l^] 

=Et+MX?+i)L7+M-  (4-11) 

By  another  argument  based  on  iterated  conditioning,  we  also  observe  that 

ET+l[p(xUi)L;^\yt] 

=  Et+1  [Et+1  b(^t°+1)Fr+i \yt  v  a{f}] \yt].  (4.12) 

The  importance  of  this  formula  stems  from  the  fact  that  under  Pr+i>  the 
process  £  has  known  statistics  and  is  independent  of  yt.  (Reviewing  our 
arguments,  we  see  that  one  essential  feature  of  the  decomposition  (4.2) 
and  of  the  Girsanov  transformation  was  that  it  allowed  the  non-Gaussian 
effects  of  £  to  be  put  into  the  oservation  noise.  Thus  the  randomness  in 
the  observations  comes  from  the  Gaussian  plant  noise  and  the  observation 
noise,  which  is  also  Gaussian  under  the  new  measure.  Making  £  measurable 
with  respect  to  Xq  makes  £  independent  of  these  noise  sequences  and  thus 
of  the  process  Y.) 

To  proceed  with  the  evaluation  of  the  inner  conditional  expectation 
on  the  right-hand  side  of  (4.12),  we  shall  write  (4.8)  more  compactly.  We 
define  the  R  "-valued  process  B  =  {Bt\  t  =  0, 1, . . .}  by 


t 

Bo  :=  0,  Bt+ 1  :=  £[#.*(*,  0)]Xn  t  —  0,1,...  (4.13) 

s=0 

and  the  <2n-valued  sequence  M  =  {Mt\  t  —  0, 1, . . .}  by 

£ 

Mo  ~On,  Aft+1  :=  Y^[HsHs,0)]'[Ha$(s,Q)].  t  =  0,1,...  (4.14) 

s=0 
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With  this  notation,  (4.8)  now  becomes 


Lt+i  ~  exP  ?st+i  ~  ^Z'Mt+ t  =  0, 1, . . .  (4.15) 
so  that,  combining  (4.2)-(4.3),  we  get 

et+1 

=  exp  (4.16) 

•  Et+i  [p(Xt+1  +  $(<  +  1,0)0  exp  [O^t+i]  \yt  V  a{0]  • 

Returning  to  (G),  which  describes  the  statistics  of  the  relevant 
rvs  under  Py+i,  we  argue  that  the  evaluation  of  (4.16)  is  concep¬ 
tually  a  simple  matter:  Indeed,  from  (G)  we  observe  that  the  rvs 
{Xt+i,Bt+uY0,Yu . .  .  can  be  expressed  as  linear  combinations  of  the 
rvs  {Ws°+1;  s  =  0, 1, ...  , Fr+i;  r  =  0,1,..., T},  and  under  Pt+i  are  thus 
jointly  Gaussian  and  independent  of  0  Therefore,  as  a  first  consequence  of 
these  facts,  we  can  write 

Et+i  ^(-^<+1  +  $(t+l,0)Oexp[OPi+i]|^V<7{0] 

=  Et+i  <p{Xt+i  +  $(<  +  l,0)z)exp[2,Pt+i]  |Yf]  .  (4.17) 

I  1  z-i 

Next,  defining  the  conditional  expectations 

X*+i  :=  Et+i  [^"t+i|Yt]  and  Bt+ 1  :=  Et+i  [P«+i|Yi]  (4-18) 

with  corresponding  errors 

-Xl+i  :=  Xt+ i  —  -X)+i  and  Bt+i  :=  ^t+i  —  Bt+ 1,  (4-19) 

we  see  that 

Et+i  [<p(X1+i  +  $(<+  l,0)z)exp[z'l?t+i]  yt 
. 

=  exp  [z'Bt+i] 

•Et+i  [^(^i+i  +  x  +  +  1, 0)z) exp  z1  Bt-^i  yt  _  , 

t  J  Ja;=X,  +  i 

2  6R".  (4.20) 
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The  evaluation  of  (4.20)  requires  only  the  Pt+i -conditional  distribution  of 
the  pair  (Xt+i,  Bt+x)  given  the  Gaussian  rvs  {Yo,  Yi, . . . ,  Yf}.  Since  the  rvs 
{Xi+i,  Bt+i,  Yo,  Yi, . . . ,  Yt]  are  jointly  Px+i^Gaussian,  it  is  well  known 
[6,  p.  10]  that  under  Pt+i,  the  pair  (Xt+i,  Bt+i)  is  independent  of  the 
rvs  {Y0,  Y1} . . Yt}  and  has  a  Gaussian  distribution  distribution  with  zero 
mean  and  covariance  matrix  St+ 1  given  by 


St+ i  :=  Et+i 


(xt+1\  (xt+ iV 

\Bt+1J  \  Bt+i ) 


(4.21) 


Therefore,  using  this  fact  in  (4.20),  we  find 

Et+i  [9(^+1  +  $(<  +  1,0)2)  exp  [z'Bt+i\  yt 

=  exp  T <p[Xt+\  +  4>(t  +  1,0 )z,z\  5<+i],  2  6  R", 

where  we  have  used  the  notation  (2.4).  It  is  now  plain  from  (4.17)  that 
Et+i  p(Xt+i  +  $(t  +  1, 0)0  exp  [z’Bt+i]  Xt  V  ct{£}] 

=  exp  [£'i?i+1]  T vp[Xt+i  +  4>(t  +  1, 0)£,  ■S't+i], 

and  going  back  to  (4.16),  we  can  now  conclude  that 


E' 


T+l 


]p(xUi)LT^\ytw  <7{o] 

exP  + 

•TV[Xm  +  $(t+l,0K^;5m]. 


(4.22) 


Finally,  by  averaging  over  £  as  indicated  by  (4.12),  we  get  the  relation 
Ex+if^^t+i^f+il^t]  =  Utp[Xt+\,  Bt+i]  M(+i,  +  l,0);5i+i]  (4.23) 


where  we  have  used  the  notation  (2.5). 

Combining  (4.10)  and  (4.23),  we  obtain  the  representation  result 


E°fc(X°+1)|J<] 


Uf{Xt+ 1,  Bt+i;  Mt+i,  4>(t  +  1, 0);  fff+i] 
m[Xt+ 1,  Bt+ 1;  M(+i,  $(t  +  1,0);  5t+i] 


(4.24) 


This  formula  is  the  discrete-time  analogue  of  Theorem  T5  in  [22].  To 
finish  the  one-step  prediction  problem,  we  only  need  to  calculate  Xt+ 1, 
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Bt+ 1,  Mf+i,  4>(t  +  1,0)  and  St+i-  We  shall  perform  these  computa¬ 
tions  in  the  more  general  case  of  Section  V.  For  the  reader  eager  to  carry 
out  in  totality  the  calculations  of  this  section,  we  note  that  the  Pr+i- 
Gaussian  rvs  {(Xt+i, Ri+1);  t  —  0,1,...,  T}  described  by  (4.6)  and  (4.13), 
obey  linear  dynamics  driven  by  a  Px+rGWN  sequence,  and  that  for 
each  t  —  0,1,..., T,  the  observation  Yt  is  a  linear  combination  of  the  rv 
(X<+i,  Bt+\)  and  of  the  Pr+i -Gaussian  noise  term  Vt+\ .  Therefore,  classi¬ 
cal  Kalman  filtering  theory  applies  and  leads  to  recursive  equations  for  the 
rvs  (Xf+i,  Bt+i)  and  the  error  covariance  matrices  ,  t  =  0, 1, . . . , T. 

Before  closing  this  section,  let  us  continue  the  calculation  of  (4.22)  in 
the  specific  case  when  tp  =  1;  this  will  be  needed  in  Section  VI.  Straight¬ 
forward  evaluations  yield 


E 


T+l 


r  -i 

^t+ 1 


exp 


yt  V  *{£}] 

f  (Mt+1  ~  ET+1[B't+1Bt+1})  £  + 


.  (4.25) 


V.  THE  CORRELATED  CASE 

We  now  turn  to  the  more  complicated  case  where  the  noise  sequences 
W°  and  V°  are  allowed  to  be  correlated  and  V°  to  be  nonstandard,  i.e., 
we  are  returning  to  the  general  assumption  (A.l). 

First,  a  few  comments  to  guide  the  discussion  of  this  more  complex 
situation:  We  seek  again  a  decomposition  of  the  form  (4.2),  with  the  objec¬ 
tive  of  separating  the  effects  of  the  non-Gaussian  initial  condition  (  from 
those  of  the  GWN  sequence  (V°,W°).  However,  as  we  review  the  argu¬ 
ments  of  Section  IV,  we  readily  see  the  main  difficulty  in  arguing  as  was 
done  there  on  the  basis  of  the  decomposition  (4.2)-(4.4):  Since  W°  and  V° 
are  correlated,  we  cannot  expect  in  general  that  a  Girsanov  transformation 
will  change  the  statistics  of'V  without  changing  the  statistics  of  W°/  As  a 
result,  under  the  Girsanov  transformation  (4.8)-(4.9)  based  on  the  decom¬ 
position  (4.2)-(4.4),  the  process  X  defined  by  (4.4)  will  probably  not  retain 
its  Gaussian  character  and  the  arguments  of  Section  IV  cannot  be  carried 
through. 

The  remedy  to  this  difficulty  is  easily  seen:  We  shall  perform  a  Gir¬ 
sanov  transformation  on  the  joint  Rn+;:-valued  process  (W°,  V°),  instead 
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of  the  GWN  process  V°  alone  (as  was  done  in  Section  IV).  We  thus  look 
for  a  new  probability  measure  under  which  an  appropriate  translate  of  the 
joint  process  (W°,  V°)  is  Gaussian  (up  to  a  finite  horizon).  Of  course,  the 
search  for  this  Girsanov  transformation  will  be  initiated  via  a  decomposi¬ 
tion  of  the  form  (4.2),  where  this  time,  the  processes  X  and  Z  are  yet  to 
be  determined,  i.e.,  we  do  not  a  priori  make  the  definitions  (4.3)  and  (4.4) 
in  the  postulated  decomposition 

X?  =  Xt  +  Zt.  t  =  0,1,...  (5.1) 

With  this  in  mind,  from  (1.1a)  and  (5.1),  we  first  obtain 

Xt+ 1  +  Zt+ 1 

=  Xf+l 

=  AtX?  +  W°+1 

=  AtXt  +  AtZt  +  W°+l.  t  —  0,1,...  (5.2) 

Prompted  by  the  remarks  above,  we  tentatively  define  the  Revalued  pro¬ 
cesses  X  and  Z  via  the  recursions 

Xo  —  0,  Xt+\  =  AtXt  —  itt  +  kP°+i  i  =  0, 1, . . .  (5.3) 

and 

Zo  —  Zt+ 1  =  AtZt  +  Jr*  t  =  0, 1, . . .  (5.4) 

for  some  Rn-valued  process  7r  =  t  =  0,1,...};  these  equations  are 
compatible  with  (5.2)  and  generalize  (4.5)  and  (4.6).  Equation  (4.7),  being 
a  direct  consequence  of  the  decomposition  (4.2),  still  holds  in  our  present 
case,  i.e., 

Yt  =  HtXt  +  Vt+i  t  =  0,1,... 

where  we  have  again  defined  the  sequence  V  by 

Vt+1  =  V°+1+HtZt.  t  =  0,1,...  (5.5) 

Moreover,  we  observe  that  (5.3)  can  also  be  rewritten  as 

Xo  =  0,  Xt+ 1  =  AtXt  +  Wt+i  t  =  0, 1, . . . 

if  the  Revalued  sequence  W  is  defined  by 

Wt+1  =  W°+1  -  irt.  t  =  0,1,...  (5.6) 
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Now,  going  back  to  the  line  of  reasoning  given  in  Section  IV  (as 
amended  above),  we  must  perform  a  Girsanov  transformation  on  the  Rn+fc- 
valued  process  (W°,V°).  The  particular  form  of  (5.5)  and  (5.6)  suggests 
that  the  relevant  Girsanov  transformation  is  the  one  associated  with  trans¬ 
lating  the  GWN  sequence  (W°,  V°)  into  the  Rn+/:-valued  process  (W,  V). 
We  do  this  as  follows:  First  we  define  the  filtration  {Tt;  t  =  0, 1, . . .}  on 
(ft,  T)  by  :=  <r{f}  and 


(W:,V3°);s=  1,2, . 


t  =  1,2,... 


Under  the  enforced  independence  assumption  in  (A. 3),  the  sequence 
(W°,V°)  is  indeed  an  (J'i,P0)-GWN  sequence.  The  feasibility  of  the 
translation  mentioned  above  will  be  established  if  we  can  find  two  JFt- 
adapted  sequences  •0W  =  t  =  0, 1, . . .}  and  ipv  =  t  =  0, 1, . . .} 

taking  values  in  Rn  and  Rfc,  respectively,  such  that 


(W+iN 

V  v,+1  ) 


t  =  0,1,...  (5.7) 


Upon  comparing  this  last  relation  with  (5.5)  and  (5.6),  we  readily  see  that 
the  processes  ipw  and  ipv  have  to  be  selected  so  that 

-sr+iV-r  -  *t+iM  =  t  =  o,  i, . . . 

and 

=  HtZt.  t  =  0,1,... 


Since  Ej+1  is  invertible,  this  can  be  achieved  by  choosing  some  as-yet- 
unspecified  Rn-valued  sequence  ip  =  {ipt\  t  =  0,1,...}  which  is  .TV- 
adapted,  and  by  taking  the  processes  ipw  and  ipv  such  that 


+?  =  i’t  and  V’?  =  -(£?+i  y'PVPrft  +  HtZt].  t  =  0,1,... 


With  this  choice,  the  process  tt  is  given  by 

7Tt  =  s  ?+1w  + 

-Er+”1(E?+1)_1ff<Z<.  <  =  0,1,... 


To  simplify  the  Girsanov  transformation  as  much  as  possible,  we  take 


fpt  =  0, 


t  =  0,1,... 
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so  that 


$°  =  0  and  rt=-W+l)-'HtZt  t  =  0,1,... 

with 

tt,  =  <  =  0,1,... 

Inserting  these  choices  (5.8)-(5.9)  into  (5.3)-(5.4)  and  (5.7)  we  have 
the  following  summary  of  our  decomposition: 

•  The  effect  of  the  initial  condition 


(5.8) 

(5.9) 


z0  =  t,  Zt+l  =  [At-Er+i(^t+i)~lHt}Zt 


t  =  0,1,...  (5.10) 


•  The  noise  processes 


(Wt+ 1 
\Vt+i 


(Wt°+1 

V  v&x 


^t+1 


vgi\(  0 
S hi)  V -(£?+! r'BtZ* 


+  X?JlW+l)-'HtZt 

vt°+l  +  Htzt 


<  =  0,1,. 


•  The  auxiliary  system 


Xo  =  0,  Xt+1  =  AtXt  +  Wt+i  <  =  0,1,...  (5.11) 

and 

yt  =  J5TtXt  +  Vi+1.  <  =  0,1,...  (5.12) 


The  Girsanov  measure  transformation  associated  with  (5.7),  under  the 
choice  (5.8),  will  be  effected  by  the  sequence  L  of  Radon-Nykodym  deriva¬ 
tives  given  here  by  Lq  :=  1  and 


Lt+ i 


:=  exp 


-£[ir,z.]'(s;+1)->v;+, 

.  s=0 


*  =  0,1,... 


\iyE,Za\\Y^x)-1[H,Z.\ 


This  expression  follows  from  Section  III  with  the  identification 


Xt 


°  ^ 
-(S  l+1)~lHtZt) 


and  At+X  =  St+1. 


<  =  0,1,... 
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Next  we  repeat  our  arguments  of  Section  IV  to  evaluate  the  conditional 
expectation  (1.4)  for  some  fixed  time  index  t  =  0, 1, . . . ,  and  for  some  given 
bounded  Borel  mappings  ip  :  Rn  — »  C.  For  each  T  =  0, 1, . . we  define  the 
probability  measure  Px+i  through  the  formula  dPx+i/dP°  =  Lt+ i — the 
same  formula  as  (4.9).  Fix  the  time  horizon  T  so  that  t  <  T.  We  seek 
to  evaluate  (1.4)  by  performing  the  calculations  under  the  transformed 
measure  Pr+i  rather  than  under  P°.  To  do  this,  we  observe  that  the 
following  property  (G*)  holds,  where 

(G*)  The  Pt+i -statistics  of  {£,  (Ws+i,  Vs+i);  s  =  0, 1, . .  .,T}  are  the 
same  as  the  P° -statistics  of  (W°+1,  V/+1);  s  =  0,1,  ...,T). 
As  a  result,  (X*;  t  —  0, 1, . .  .,T}  and  {Vt+1;  t  —  0, 1, . . ., T}  are 
jointly  Pt+i  -Gaussian  with  known  statistics,  and  the  rv  £  has 
known  Pt+i -statistics  and  is  Pj+i  -independent  of  the  observa¬ 
tions  {Ys;  s  =  0, 1, . .  .,T}. 

As  a  consequence,  X  satisfies  the  same  dynamics  as  X°,  except  that  W  re¬ 
places  W°,  and  furthermore  the  Px+i-statistics  of  {Wj+1;  f  =  0, 1, . .  .,T} 
are  the  same  as  the  P°-statistics  of  {IV°+1;  t  =  0,1,... ,T}.  This  very 
nicely  ties  our  calculations  to  the  dynamics  of  the  original  system  (1.1a). 

From  (5.10),  we  find  that 

zt  =  *M)£  *  =  o,i,... 

where  here  $  is  the  state  transition  matrix  defined  by 

*(s,s)  -  In,  *(t  +  1  ,s)  =  [At  -  sr+\(st+i )-x^]^(t,s), 

s<t.  s,t=  1,2,...  (5.13) 

When  SJ""  =  0  for  all  t  =  1, 2, . . .,  we  see  that  ^  agrees  with  4>.  We  shall 
see  that  essentially  the  only  change  we  need  to  make  to  equations  (4.13)- 
(4.24)  will  be  to  replace  $  of  (4.1)  by  'S  of  (5.13),  so  it  will  at  each  step  be 
clear  how  our  current  calculations  generalize  those  of  Section  IV. 

The  remainder  of  the  arguments  leading  to  the  analogue  of  (4.24)  is  es¬ 
sentially  the  same  as  in  Section  IV.  The  Kallianpur-Striebel  formula  (4.10) 
still  holds,  as  does  the  martingale  argument  of  (4.11)  and  the  iterated  con¬ 
ditioning  argument  of  (4.12).  We  can  still  write  L  in  the  form  of  (4.15)  if 
we  now  make  the  definitions 

t 

B0:=  0,  Bt+i  :=Y,[H,ns,0)]'Vs+1  *  =  0,1,...  (5.14) 

s- 0 
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and 


M0  :=  On,  Mt+1  := 

s= 0 

*  =  0,1,...  (5.15) 

instead  of  (4.13)  and  (4.14).  Equations  (4.16)-(4.17)  still  hold  if  we  replace 
$(<  +  1, 0)  by  $( t  +  1,0).  We  define  the  rvs  {Xfj  t  =  0, 1, . .  .,T),  { Bt ;  t  — 
0, 1, . .  .,T},  {Xt'i  t  =  0, 1, . .  ,,T}  and  {Bp,  t  =  0, 1, . .  .,T}  as  in  (4.18)  and 
(4.19),  and  then  (4.20)  holds  if  we,  again,  replace  4»(t  +  1,0)  by  $( t  + 
1,0).  By  virtue  of  the  property  (G*)  and  of  the  equations  (5.11)-(5.12) 
and  (5.14),  we  see  that  the  rvs  {Xt+i,  Bt+\,Yo,Yi, . . .  ,Yt}  are  all  linear 
combinations  of  {(Wi+i, Fs+i);  s  =  0,1,...,/},  whence  they  are  Pt+i- 
jointly  Gaussian  and  independent  of  £.  Defining  St+ i  as  in  (4.21),  we  find 
that  (4.23)  holds  with  $(/  4-  1,0)  replacing  $(/  +  1,0).  Finally,  we  have 


ror.^vo  \h,i  _  Uv[Xt+i,  Bt+i\ +  1,0);  £*+1] 

~  m[ 


(5.16) 


which  is  the  generalization  of  (4.24)  to  the  correlated  case.  Observe  that 
the  dependence  of  (5.16)  upon  the  statistics  Xt+i,  &t+ 1,  Mt+i,  ^(t  +  1,0) 
and  St+\  is  the  same  as  the  dependence  of  (4.24)  upon  the  statistics  Xt+i, 
Bt+ 1,  Mt+1,  $(f-f  1,0)  and  5t+i.  It  is  only  the  definitions  of  these  statistics 
which  is  changed  and  not  the  form  of  the  statistics-bearing  functional  as 
we  move  from  the  uncorrelated  case  to  the  correlated  case. 

To  finish  our  study  of  the  one-step  prediction  problem,  we  should  give 
a  method  for  actually  computing  Xt+ 1,  Bt+i,  and  St+ 1-  By  writing  down 
the  dynamical  equations  for  B  which  correspond  to  (5.14),  we  see  that  the 
pair  (X,  B)  satisfies  the  recursive  plant  equation 


(In  0  \(Wt+1\ 

V0 


to  which  we  adjoin  an  observation  equation 

Yt  =  (Ht  0)(j;)+(0  4)(^)-  *  =  0,1,...  (5.17b) 
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Since  the  rvs  {(W3+i,y,+1);  s  =  0,1  constitute  a  Pr+i-GWN 

sequence,  the  Kalman  filtering  formulae  may  be  applied  to  find  dynami¬ 
cal  equations  for  the  Px+i-statistics  {( Xt,Bt );  t  =  1,2, . .  .  ,T  +  1}  and 
{St',  t  =  1,2,...,T  +  1}. 

Prior  to  applying  the  Kalman  filtering  equations  to  (5.17a)-(5.17b), 
we  make  several  comments  concerning  the  effect  of  the  horizon  length  T : 
The  probability  measure  Pj+i  was  constructed  to  ensure  that  the  finite- 
horizon  sequence  {(Wt+i,  V*+1);  t  —  0, 1, . .  .,T}  is  a  P^j-GWN  sequence, 
and  thus  that  the  system  (5.17a)-(5.17b)  is  amenable  to  Gaussian  linear 
filtering  methods  for  t  =  0, 1,...,T.  By  the  dynamic  nature  of  (5.17a)- 
(5.17b),  the  rvs  {Xt+i,  Bt+i,Yo,Y\, . . .  ,Yt}  are  measurable  with  respect  to 
{(IKs+i,  Vs+i);  s  =  0,l,...,f}  for  each  t  —  0,1, _  Now,  by  the  con¬ 

sistency  property  mentioned  at  the  end  of  Section  III,  the  statistics  of 
{(IKj+i,  V,,+i);  s  =  0,1,...,  t)  are  the  same  under  any  two  measures  Pt+i 
and  Pt'+i  so  long  as  t  <  T  <  T' ,  and  therefore  the  rvs  Xt+1,  Bt+i  and 
St+ 1  of  (4.18)  and  (4.21)  do  not  depend  upon  the  horizon  T  for  T  >  t.  In 
particular,  we  may  take  T  —  t  in  (4.18)  and  (4.21),  setting 

Xt+i  :=  Et+1  [Xt+i|^t]  and  Bt+i  :=  E<+i  [5t+i  |3^t]  (5.18) 


and 


>t+ 1 


:=  E 


t+ 1 


(Xt+1 

-Xt+i\ 

(Xt+ 1 

-xt+ iV‘ 

-Bt+i) 

\Bt+i 

-  Bt+i  )  _ 

7 

(5.19) 


for  all  i  =  0,1,...,  which  defines  X,  B  and  S  as  infinite-horizon  sequences. 

Turning  now  to  the  structure  of  the  Kalman  filter,  as  in  (1.6)-(1.7), 
we  recall  that  one  of  the  strengths  of  the  Kalman  filtering  equations  is  that 
they  are  recursive,  i.e.,  at  any  time  t  =  0,1,...,  the  one-step  predictor  of 
the  state  at  time  f+1  and  the  corresponding  (conditional)  covariance  matrix 
may  be  obtained  by  updating  these  quantities  at  time  t  using  the  observation 
at  time  t.  Combining  this  with  the  remarks  concerning  the  consistency  of 
the  measures  {Pr^.!;  T  =  0, 1, . . .},  we  conclude  that  the  Kalman  filtering 
equations  which  yield  Xt+i,  Bt+i  and  St+i  do  not  depend  on  the  choice 
of  the  horizon  T,  with  T  >  t.  A  moment  of  reflection  will  convince  the 
reader  that  consequently,  we  may  get  valid  recursive  equations  for  X,  B 
and  S  by  applying  the  Kalman  filtering  equations  to  (5.17a)~ (5. 17b)  as  if 
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there  were  a  measure  P  under  which  the  entire  infinite-horizon  sequence 
{(PFi+i, t  =  0,1,...}  were  a  P -GWN  sequence.  In  line  with  the 
closing  comments  of  Section  III,  we  remark  that  such  a  measure  P  will  not 
in  general  exist — we  are  merely  introducing  it  as  a  fictitious  construct  to 
aid  us  in  arriving  at  valid  equations  for  X,  B  and  S. 

With  these  thoughts  in  mind,  we  apply  the  Kalman  filtering  equations 
(1.6) — (1.7)  to  the  higher-dimensional  system  (5.17a)-(5.17b).  To  help  us, 
we  first  write 

s'=(£;  *)  <  =  0,1,...  (5.20, 

as  this  corresponds  to  a  natural  partition  of  the  covariance  matrices  of 
(5.19)  according  to  the  (conditional)  covariances  of  the  processes  X  and  B. 
After  some  simplification,  we  get  the  following  recursions  for  the  sequences 
P  =  {Pt;  t  =  0,1,...},  Q  =  {Qt;  *  =  0,1,...}  and  R  =  {Rt;  t  =  0,1,...}: 
The  Qn-valued  deterministic  sequence  P  satisfies  the  recursion 
Po  =  On, 

Pt+1  =  AtPtA't  +  Sr+1  *  =  0,1,...  (5.21) 

-  [AtPtH[  +  [HtPtH't  +  Y,vt+1}-x[AtPtH't  A  S^]'. 

For  convenience,  we  introduce  the  Q^-valued  deterministic  sequence  J  = 
{Jt;  t  =  0,1,...}  by 

Jt  :=  HtPtH't  +  £”+1.  <  =  0,1,... 

The  recursion  for  the  Af^-valued  sequence  Q  is 

Qo  =  On , 

Qt+i  =  AtQt  -  [ AtPtH't  +  Ef+i (Qt  +  *(*,0)) 

+  0)  <  =  0,1,...  (5.22) 

and  the  recursion  for  the  Qn-valued  sequence  R  is 

Ro  =  On, 

Rt+1  =  Rt-  (Qt  +  t(<,  0)/  (Qt  +  $(<,  0)) 

+  ^'(t,0)H^Ht^(t,0).  <  =  0,1,...  (5.23) 

The  processes  X  and  B  of  (5.18)  satisfy  the  dynamical  equations 
Xo  -  0, 

xt+i  =  Atxt+l  +  [AtptH[  +  ^r+i\Jrl{yt  -  Htxt+1\ 

<  =  0,1,...  (5.24) 
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and 


Bo  =  0 

Bt+1  =  Bt  +  (Qt  +  9(t,0))' H't  Jr1  [Yt  -  HtXt\. 

t  =  0,1,...  (5.25) 


Let  us  also,  if  only  for  the  sake  of  having  a  dynamical  representation  for  all 
of  the  processes,  rewrite  (5.15)  in  the  form 

Mo  =  On,  Mt+ 1  =  Mt  +  $(t,0),^,(SJ'+1)-1^t$(t,0). 

*  =  0,1,...  (5.26) 

Thus  we  have  that  for  any  bounded  Borel  mapping  <p  from  Rn  to  C,  (5.16) 
holds,  where  S,  X  and  B  are  defined  by  (5.20),  (5.24)  and  (5.25).  We  state 
this  result  as  a  theorem: 

Theorem  5.1.  For  any  bounded  Borel  mapping  <p  :  E"  — ►  C  and  any 
t  =  1, 2, . . . ,  we  have  that 


Tnor  /vo  \hn  M<p[Xt+i,  Bt+1;  Mt+i,  9(t  +  1, 0);  5i+i] 
E  [<p{Xt+1m  -  m[xt+u  Bt+p,Mt+1Mt+hoy,st+1} 


(5.27) 


where  the  processes  S,  X,  B  and  M  are  given  respectively  by  (5.21),  (5.24), 
(5.25)  and  (5.26),  and  the  state  transition  matrix  9  is  defined  by  (5.13). 
The  component  sequences  P,  Q  and  R  ofS  propagate  according  to  (5.21)- 
(5.23). 

Observe  the  very  special  structure  of  (5.27)  in  that  E°[<^(X°+1)|3;t]  can 
be  computed  by  inserting  a  collection  of  finite-dimensional  and  computable 
sufficient  statistics  which  do  not  depend  upon  the  distribution  of  the  ini¬ 
tial  condition  into  a  functional  which  is  determined  solely  by  the  initial 
distribution.  Note  also  that  this  functional  does  not  depend  upon  time. 

The  calculation  (4.25)  at  the  end  of  Section  V  ma,y  now  be  more  com¬ 
pactly  rewritten  as 


Ex+i 


yt  v  *{£} 


=  exp 


~e(Mt+1-Rt+1)t  +  t'Bt+ 1 


(5.28) 


As  a  closing  remark,  we  direct  the  reader’s  attention  to  the  similarity 
of  (1.6)-(1.7)  and  (5.21),  (5.24).  The  reader  will  note  the  pleasing  fact 
that  the  recursions  for  X  and  P  are  exactly  those  we  would  get  for  the 
conditional  means  and  conditional  covariances  of  (l.la)-(l.lb)  if  £  =  0 
(i.e.,  f  is  a  degenerate  Gaussian  rv). 
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VI.  REPRESENTATIONS  FOR  THE  MMSE,  LLSE, 

AND  MMSE-LLSE  ERROR 

Having  solved  the  one-step  prediction  problem  in  Theorem  5.1,  we  now 
use  this  result  to  give  representations  for  the  MMSE  and  LLSE  estimates 
and  for  the  mean  square  error  between  these  estimates.  As  these  represen¬ 
tations  are  basic  to  the  ensuing  asymptotic  analysis  of  e  given  in  Sections 
VII  and  VIII,  we  devote  this  entire  section  to  their  derivations. 

As  perhaps  the  most  direct  way  of  representing  the  MMSE  process  X 
using  Theorem  5.1,  we  will  first  find  a  representation  for  the  conditional 
characteristic  function 

E°  [exp[i0'X°+1]| yt]  ,  9  E  Rn,  t  =  0, 1, . . . 

and  then  differentiate  it  with  respect  to  9  at  6  —  0.  This  line  of  arguments  is 
readily  validated  by  the  fact  that  the  rv  X°,  j  is  an  element  of  L2(fl ,  P,  P°). 
To  carry  out  the  calculations,  we  find  it  convenient  to  define  for  each  9  in 
R”,  the  bounded  Borel  mapping  tpg  :  ln  — > ►  C  by 

<p$(x)  :=  exp [i9'x],  x  E  Rn. 

Fix  t  =  0, 1, . . .  and  9  in  Rn,  and  recall  the  definitions  (2.4)  and  (2.5). 
We  find  by  simple  calculations  that 

T<pg[x,b;St+ 1] 

=  exp  i9'x  —  —  9  Pt+i9  4-  i9'Qi+\b  -j-  —  b'Rt+ib  ,  x,b  E  R71. 

Another  simple  calculation  shows  that 
ll<pg[x,b]  +  1,0); 

=  E°  [exp  i9'(x  +  Q*+10  -  \9'Pt+l9  +  b't  -  \z' R*+1z 
-  [  exp  i9'(x  +  Qt+1z)  -  \9'PtJri9  +  b'z  -  \z'R*tJrlz  dF(z), 

JlK"  Z  l 

x,b  E  R71,  (6.1) 

where  we  have  set 

Qt  '•=  Qt  +  ®(<,0),  and  R*  :=  Mt  -  Rt. 
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Since  1  =  <pg  for  9  =  0,  we  can  set  9  =  0  in  (6.1)  in  order  to  get 


Ul[x,  b;  Mt+i,  $(t  +  1,0);  St+i] 


Iw 


=  /  exp 
/!» 


b'Z  ~  \Z'R*+'Z 


dF(z ), 


x,6eKn.  (6.2) 


Combining  the  expressions  (6.1)  and  (6.2),  when  evaluated  at  x  =  Xt+ \ 
and  b  =  Bt+\,  we  conclude  from  Theorem  5.1  that 


E° 


exp[j(9'Xt0+1] 


(6.3) 


/m-  exP  [i9'(xt+ i  +  Q*+ig)  ~  ^Pt+ig  +  -  ±z'R*+1z\  dF(z) 

/*-  eXP  [^£+l2  -  ^(2) 


Finally,  upon  taking  the  gradient  in  (6.3)  at  0  =  0,  we  obtain  the  following 
result. 

Theorem  6.1.  For  each  t  =  0, 1, . . .,  the  relation 


E°[Xt°+1\yt]  =  Xt+1+Qt+1 


/a„  ^ exp  [z'Bt+1  -  \z'R*+lz]  dF(z) 
/a»  exp  [z'Bt+i  -  \z'Rkt+xz\  dF(z) 


(6.4) 


holds. 

In  the  spirit  of  the  recursions  (5.22)-(5.23),  we  can  now  write  down  the 
dynamical  equations  for  the  matrices  Q*  and  R*;  we  find  that 

Qo  =  In,  QU 1  =  [At  -  [AtPtH[  +  S wv]JTxHt]  Q\  t  =  0, 1, . . .  (6.5) 

and 

R*o  =  On,  R*+1  =  R*t+Q*'Hijr1HtQ*t.  t  =  0,1,...  (6.6) 

Using  the  process  Q*  instead  of  Q,  we  can  simplify  the  dynamics  of  B  to 
read 


B0  =  0,  Bt+1  =  Bt  +  Q*'H'tJ^[Yt  -  HtXt\.  t  =  0,1,... 

Proceeding  now  on  to  the  LLSE  process  XK,  we  can  easily  get  a  rep¬ 
resentation  for  it  by  making  use  of  the  following  two  facts:  (a)  The  LLSE 
or  Kalman  filter  depends  solely  on  the  first  and  second  moments  of  the 
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involved  rvs  [2,  Sec.  5.2];  (b)  The  MMSE  and  LLSE  filters  coincide  if  f  has 
a  Gaussian  distribution.  Governed  by  these  thoughts,  we  replace  F  in  (6.4) 
by  a  Gaussian  distribution  of  mean  p  and  covariance  A  to  get,  after  some 
calculations,  the  following  representation  result. 

Theorem  6.2.  If  the  covariance  matrix  A  of  the  initial  condition  £  is 
invertible,  then  we  have  the  relation 

x!U  =  Xt+ 1  t  =  i,2,... 

+  Q*+ 1  [B-t+i  +  A  x]  [Bt+ 1  +  A  V]  . 


In  view  of  this  last  result,  we  strengthen  (A. 3)  to  condition  (A. 4),  namely 

(A.4):  The  initial  condition  £  has  distribution  F  with  finite  first  and 
second  moments  p  and  A,  respectively,  and  is  independent  of  the 
process  (W°,  V°).  Furthermore,  A  is  invertible,  so  that  F  belongs 
to  2)(Rn). 

For  the  remainder  of  this  paper  we  assume  (A.l),  (A. 2)  and  (A.4)  to 
hold. 

The  final  representation  we  seek  is  for  the  error  process  e  defined  by 
(1.10)  and  which  measures  how  well  the  MMSE  and  LLSE  estimates  agree 
in  quadratic  mean.  We  shall  naturally  start  from  the  representation  results 
of  Theorems  6.1  and  6.2.  In  order  to  make  the  notation  less  cumbersome, 
we  introduce  the  mapping  7  :EnxlnX  ->K  defined  by 


7 (z,b;R)  :=  exp 


z,  b  £  K rt,  R  £  Qn- 


(6.7) 


Fix  t  =  0,1,....  Combining  Theorems  6.1  and  6.2,  we  write  the  dif¬ 
ference  Xt+\  -  X/+a ,  with  the  help  of  the  function  7,  in  the  form 


Xt+i  -  XlU  =  Q 


<t+i' 


J*n{---h(*,Bt+1;R*+1)dF(z) 
JMn  7(z,Bt+1;RI+])dF(z ) 


where 


(6.8) 


{•••}  —  z  —  [-R*+i  +  A  !]  [Bt+ 1  +  A  V]  . 

We  clearly  see  from  this  formula  that  Xt+i  ~  X{\_ \ — which  we  a  priori  know 
depends  upon  the  observations  {Fq,  Yj, . . . ,  Yt} — in  fact  depends  upon  these 
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observations  only  through  the  rv  Bt+\.  To  calculate  et+ 1,  we  thus  must, 
after  applying  the  mapping  x  i->  ||a;||2,  average  the  expression  (6.8)  over 
Bt+i- 

To  gain  some  insight  into  how  to  perform  these  calculations,  recall  the 
definition  (4.18)  of  P(+1  as  well  as  the  fact  that  the  rvs  {Bt+i,  T0,  Y\, . . . ,  Yt } 
are  jointly  Pj+i  -Gaussian.  In  contrast,  the  P°-statistics  of  Bt+l  are  in 
general  non-Gaussian!  This  state  of  affairs  suggests  that  the  computations 
for  should  be  performed  under  P<+i  instead  of  under  P°.  We  shall  in 
fact  see  that  many  of  the  properties  of  Pt+i  which  were  used  in  deriving 
Theorem  5.1  will  also  be  used  here. 

We  begin  by  observing  that 


£t+ 1  —  E£+i 

[ll^+i  -  *£ill%+i] 

=  Ei+r 

[ll^+i  -  ^+ill2Et+i  [ir+iltt  v 

(6.9) 

=  %+i 

[\\xt+1  - xtK+l\f )((,%>;%>)  . 

(6.10) 

In  passing  from  (6.9)  to  (6.10)  we  have  used  (5.28).  We  also  recall  that 
the  rv  Bt+i  and  the  Tt -measurable  rv  Xt+ 1  -  %t+\  are  P<+i~independent 
of  £.  Therefore,  a  simple  conditioning  argument  in  (6.10)  followed  by  an 
application  of  Tonelli’s  theorem  yields 

£f+i  =  Et+i  -  X*xf  [  Bt+i;  Mt+i)dF(z) 

L  7i" 

=  /  %+i\\\Xt+1-Xi<+l\\21(z,Bt+1-Mt+1)}dF(z).  (6.11) 

We  next  decompose  Bt+\  as  Bt+i  =  Bt+ 1  +Bt+i  as  in  (4.18)-(4.19).  Since 
Xf+i  —  Xf^  is  }-measurable,  as  we  remarked  above,  we  find  that 

Et+1  [WXt+t-X^W  27(z,Bt+1;Mt+1)\ 

=  Ej+1  ||!m  [r(z,Bt+l]Mt+1)  Bt+1  ,  z£  Rn. 

In  Sections  IV  and  V  we  used  the  fact  that  &t+ 1  and  Bt+ 1  are  Pt+j- 
independent;  we  again  use  this  fact,  together  with  the  definition  of  the 
matrix  Rt+i  as  the  Pt+1-covariance  of  the  zero-mean  Pt+1-Gaussian  rv 
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Bt+ 1,  to  make  the  simplification 

Et+i  -f(z,  Bt+1‘Mt+1)  Bt+1  =  Ei+i  h(z,b+  Bt+l;Mi+i)] 

=  exp  -  ^z'R*t+1z 

=  i(z,Bt+i;Rt+i),  zeK" 

Therefore, 

Ei+1  [||Xt+i  —  X{+1\\2'y(z,Bt+i-,Mt+i) 

=  Ei+1  [\\Xt+1-XtK+1\\ 27(z,Bt+1;R*+1)\  ,  2  G  Rn.  (6.12) 


Inserting  (6.12)  into  (6.11),  we  readily  see  from  (6.8)  that 

£*+i  =  /  Et+1  \\\Xt+1  -  X{<+1\\ 27(z,P(+1;i2*+1)l  dF(z)  (6.13) 

=  ,  [IlCtu /».{■••)■?(».  jw;Jg+iyrWH2l  ,, 

1+1  [  lP7(a,«;ii;(,W«)  J  1 

where 

{.•■}  =  z-[^+1  +  A-1]“1  [fl1+1+A-V] 

after  a  simple  cancellation.  It  now  remains  only  to  find  the  Pt+i~statistics 
of  the  rv  J3*+i;  these  are  presented  in  the  following  lemma. 

Lemma  6.3.  For  all  t  =  0, 1, . . .,  under  the  probability  measure  Pt+i,  the 
rv  Bt+i  is  Gaussian  with  zero  mean  and  covariance  i?*+1. 

Proof.  The  arguments  of  Section  V  already  show  us  that  Bt+\  is  Pt+i- 
Gaussian:  Indeed,  the  rv  Bt+i  is  the  conditional  Pi+1  -expectation  of  Bt+i 
given  the  rvs  {Fb,Fi, .. .,!*},  and  moreover  the  rvs  {Bt+\,Yo,Yi, . . .  ,Yt} 
are  jointly  P4+i~Gaussian.  The  rv  Bt+i  has  zero  mean  under  P;+i  since 
Bt+\  has  zero  mean  under  Pt+i-  Finally,  since  the  rvs  Bt+ 1  and  Bt+ j  are 
P<+i-independent,  it  is  easy  to  see  that 


C°vp(+154+i  =  Covp(+i5t+i  +  Covpi+iBt+i- 

From  (5.14)  and  (5.15),  we  see  that  Covp(+il?t+i  =  Mt+\,  and  by  defini¬ 
tion,  iZj+i  is  the  Pj+i-covariance  of  the  rv  Bt+i.  This  completes  the  proof. 
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This  last  property  leads  to  a  simple  representation  for  the  error  process 
e  when  used  on  (6.13)-(6.14). 

Theorem  6.4.  For  each  t  —  0, 1, . . .,  we  have  the  representation 


£t+ 1 


-/ 


\\Q$+i  JMn  {•  •  • }l(z ,  &;  R*+1)dF(z ) 

Jm»  lizi  ^t+i )dF(z) 


- dGRt(b )  (6.15) 


where 


{-..}  =  2  -  +  A  *]  l[b  +  A  V] 


VII.  THE  ASYMPTOTIC  BEHAVIOR  OF  e : 

THE  MULTIVARIABLE  CASE 

We  now  begin  our  study  of  the  asymptotics  of  the  error  process  £,  with 
Theorem  6.4  as  our  main  tool,  and  based  upon  the  dependencies  suggested 
by  (1.11).  In  order  to  make  such  an  analysis  feasible,  we  must  of  course 
enforce  some  structure  on  the  asymptotics  of  the  matrix-valued  sequences 
A,  H  and  E  which  describe  the  dynamics  of  the  system  (l.la)-(l.lb). 
We  shall  from  here  on  restrict  ourselves  to  the  classical  time-homogeneous 
situation.  This  is  captured  by  assumption  (A. 5)  enforced  hereafter,  where 

(A. 5):  The  matrix-valued  sequences  A,  H  and  E  are  constant;  that  is, 
there  are  matrices  A,  H  and  E  in  M n,  Mkxn  and  Qn+k ,  re¬ 
spectively,  such  that  for  all  t  =  0, 1, . . .,  At  —  A,  Ht  —  H  and 

Et+i  -  £• 

It  is  clear  from  the  form  of  (6.15)  that  we  may  study  st  as  a  functional 
of  the  initial  distribution  F  and  of  the  matrices  and  R for  each  t  = 

1,2, _ Not  surprisingly  then,  a  central  component  of  our  efforts  will  be  an 

analysis  of  the  asymptotics  of  the  sequences  Q*  and  R*.  An  examination 
of  the  dynamics  of  these  sequences  will  indicate  several  cases  which  are 
amenable  to  straightforward  analysis.  However,  as  often  the  case  with 
general  multivariable  systems,  a  complete  analysis  is  not  possible. 

A  considerable  simplification  occurs  in  (6.15)  if  p  =  2?[£]  =  0.  Of 
course,  since  the  dynamics  (l.la)-(l.lb)  are  linear,  no  generality  is  lost 
by  assuming  that  the  initial  distribution  F  is  centered.  Indeed,  a  simple 
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translation  argument  shows  that  for  any  distribution  F  in  Z>(Rn)  with  mean 
p,  the  relation 


£t{{A,H,Y.),F)  =  et{{A,H,Y.),F)  t  =  1,2,...  (7.1) 

holds,  where  F  is  the  centered  distribution  in  X>°(Rn)  defined  by  the  trans¬ 
lation 

F(x )  =  F{x  -n),  x£  Rn.  (7.2) 

Consequently,  we  shall  henceforth  in  this  section  consider  only  distributions 
Fin  X>°(Rn). 

In  order  to  more  clearly  understand  the  dependence  of  the  right-hand 
side  of  (6.15)  upon  the  distribution  F  and  the  matrices  Q*  and  R*,  we 
define  the  mapping  IF  :  A4n  X  Qn  — *  R  as 

If{K,R ) 

F/an  {z-[R  +  *-1]-1b}i(z,b;R)dF(z) 

JMn  R)dF(z) 

K  G  Mn,  R  G  Qn-  (7.3) 

In  Lemma  7.1  below  we  verify  that  this  expression  is  well  defined  for  all 
matrices  K  in  Mn  and  R  in  Qn,  so  that  (6.15)  may  indeed  be  rewritten  as 

et  =  IF(QlR*t).  t  =  1,2,...  (7.4) 


2 

l-dGR(b), 


This  clearly  separates  the  dependence  of  £<  on  the  matrices  Q *  and  F*, 
which  depend  only  on  the  system  triple  (A,  H ,  S),  from  the  dependence  on 
the  initial  distribution  F.  The  distribution  F  affects  £<  only  through  the 
structure  of  the  functional  IF,  whereas  the  system  triple  and  time  affect  £< 
only  through  the  matrices  Q*  and  R*. 

Before  beginning  with  our  calculations,  we  set 

T(b;R):=  [  7(2, 6;  R)dF(z),  b  G  Rn,  R  G  Qn. 

Some  of  our  manipulations  of  (7.3)  will  be  clearer  when  using  this  notation. 
We  first  show  that  (7.3)  is  well  defined  for  all  K  in  M.n  and  all  R  in  Qn. 
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Lemma  7.1.  Let  F  be  a  distribution  in  V°(Rn).  For  ail  K  in  Mn  and 
R  in  Qn,  the  quantity  Ii?(K,  R)  is  well  defined  and  finite,  with  alternate 
representation 

IF(K , R)=  f  {■■  -}r(6;  R)dGR(b ),  K  G  Afn,  R  G  Q„,  (7.5) 

where 

{•••}  = 


Proof.  In  view  of  (7.4),  we  already  know,  via  the  probabilistic  arguments 
of  Section  VI,  that  (7.3)  is  well  defined  and  finite  when  K  =  Q *  and  R  =  R* 
for  all  i  =  1,2,...,  given  any  system  triple  (A,  H ,  S).  Rather  than  working 
to  extend  these  probabilistic  calculations  to  cover  all  K  and  R,  we  shall 
instead  study  formula  (7.3)  by  simple  techniques  from  analysis.  In  the 
process,  we  shall  obtain  a  bound— inequality  (7.9) — which  will  prove  useful 
later  on. 

Fix  K  in  Mn  and  R  in  Qn.  First  we  show  that  Ip( K,  R)  is  well  defined. 
Observe  that  whenever  b  lies  in  the  range  Im(f2)  of  R,  the  quadratic  form  in 
the  exponent  of  7  in  (6.7)  is  amenable  to  a  completion  of  squares,  namely 

z'b  -  \z'Rz  =  \b'R*b  -\{z-  R*b )'  R(z-  R*b) , 

z  G.  R",  b  G  Im(J2), 

where  R *  denotes  the  Moore-Penrose  pseudo-inverse  of  R  [2,  pp.  329-330]. 
Consequently, 


0  <  7 (z,  6;  R)  <  exp 


1 

2 


b'R*b 


zGR",  be  Im(iZ), 


(7.6) 


and  the  bound 


0  <  T(b;  R)  <  exp 


\b'R*b 


b  G  Im(i2), 


(7.7) 


holds.  The  bound  (7.6)  and  the  finite  second  moment  assumption  (A. 2) 
on  £  together  imply  that  the  inner  integral  in  the  numerator  of  (7.3)  is  well 
defined  and  finite  for  each  b  in  Im(i2).  Therefore,  since  the  support  of  the 
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Gaussian  distribution  Gr  is  exactly  Im (R),  we  conclude,  using  (7.7),  that 
IF(K,R )  is  indeed  well  defined.  As  a  result  of  this  discussion,  we  see  that 
the  alternate  expression  (7.5)  is  indeed  valid. 

Next,  to  show  that  IF(K,R )  is  finite,  we  first  observe  from  Jensen’s 
inequality  and  from  the  definition  of  the  operator  norm  (2.1)  that 

K  iJz-[R+A~l^-wwdn4 

b  6  lm(R).  (7.8) 


Next,  we  set 

MR) 

:=  /  [  \\z-[R  +  A-1]-H\\2M,b;R)dF(z)dGR(b) 

Jxn  Jxn 

I \z  --  [if!  +  A“1]_16||2  exp[z'b]dGR(b)\  exp 


Ixn 


-\z'Rz 

2 


dF{z) 


where  the  last  equality  follows  from  Tonelli’s  theorem,  and  it  is  now  plain 
from  (7.5)  and  (7.8)  that 

IF{K,R)<\\K\\lpJF(K).  (7.9) 

Finally,  after  some  tedious  calculations,  we  find  that 

Jf{R)  =  trace  ([R  +  A-1]-1i?[E  +  A-1]-1)  (7.10) 

+  /  z'A-^RF  A~1]~1[R+ 

Jxn 

and  therefore  since  £  has  finite  second  moments,  JF(R)  is  finite  and  so  is 
If(K,  R)  as  a  result  of  (7.9).  | 

With  Lemma  7.1  in  hand,  we  now  commence  the  study  of  the  asymp¬ 
totics  of  e  through  the  representation  (7.4).  This  requires  that  we  study 
the  behavior  of  IF  under  the  joint  asymptotic  behavior  of  the  sequences 
Q*  and  R*.  By  using  (2.3)  on  (7.3),  we  get 

A  mUQ*'Qt)Min,R*t) 

<  £),  F) 

<KMTQ*t)Min,R*t),  t  =  1,2,...  (7.ii) 
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thereby  separating  somewhat  the  effects  of  Q*  from  those  of  R*.  This  di¬ 
rects  our  study  towards  an  understanding  of  the  behavior  of  Qf'Q*  and 
t)  for  large  times  t  under  different  assumptions  on  the  initial  dis¬ 
tribution  F  and  the  system  triple  (A,H,  E). 

Some  general  comments  on  the  structure  of  the  dynamics  of  the 
matrix-valued  sequences  P,  Q*  and  R*  are  in  order  at  this  point.  Firstly, 
the  equation  (5.21)  for  P  is  a  discrete-time  Ricatti  equation.  Since  discrete- 
time  Ricatti  equations  have  been  extensively  studied,  there  exists  a  fairly 
large  body  of  results  concerning  the  large-time  asymptotics  of  P.  Turning 
next  to  the  sequence  Q*,  we  rewrite  (6.5)  as 

Qo=In,  Q*t+i=KtQ*t  t  =  0,1,...  (7.12) 
where  we  have  set 

Kt  :=  A  -  [APtH'  +  Xwv][HPtH'  +  Zv]~lH.  t  =  0, 1, . . . 

Thus  we  expect  Q*  to  exhibit  some  sort  of  exponential  growth  or  decay 
which  will  depend  primarily  upon  the  spectrum  of  the  limiting  matrix 
limj/vj,  if  this  limit  exists.  As  the  existence  of  this  limit  defines  an  im¬ 
portant  situation,  we  formally  introduce  the  following  condition  (C.l), 
where 

(C.l):  The  limit  I(0 0  :=  limt  Kt  exists. 

Of  course,  if  :=  limt  Pt  exists,  then  (C.l)  holds  true,  and  we  have 

K00  =  A-  [AP^H'  +  EHt^Pootf'  +  ST1#. 

Finally,  we  can  see  more  clearly  what  to  expect  of  R*  by  rewriting  (6.6)  as 

t-i 

R*  =  Y,  Q*'E'[HPSH'  +  ST1#^.  t  —  1,2,...  (7.13) 

5=0 


We  note  here  for  future  reference  that  R*  is  nondecreasing  in  the  sense  of 
the  partial  ordering  on  Qn;  i.e.,  v'R*v  <  v'R*+1v  for  all  v  in  1"  and  all 
t  —  0, 1, . . ..  However,  in  general  this  does  not  imply  convergence  of  the 
sequence  R*  which  would  be  equivalent  to  the  convergence  of  the  sequence 
of  cross-terms  {u'R*v,  t  =  0, 1, . . . }  for  all  u  and  v  in  Rn.  That  the  previous 
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convergence  “on  the  diagonal”  is  not  sufficient  to  ensure  full  convergence 
can  be  seen  from  the  parallelogram  identity,  which  here  takes  the  form 

( u  +  x})'R*(u  +  v)  —  (u  —  v)'R*(u  -  v)  =  4u'R*v, 

u,  v  G  Rn.  <  =  0,1,... 

Therefore  the  cross-terms  u’R\v  will  not  converge  unless  both  monotone 
sequences  ( u  +  v)'R*(u  +  v)  and  ( u  —  v)'R*(u  -  v )  have  a  finite  limit. 
In  view  of  these  comments,  we  shall  find  the  following  to  be  a  valuable 
hypothesis  on  the  asymptotics  of  R*: 

(C.2):  The  sequence  R*  has  a  well-defined  limit  R^  which  is  positive 
definite  and  thus  invertible. 

We  observe  from  (7.13)  that  a  natural  situation  in  which  (C.2)  holds  arises 
when  the  sequence  Q*  tends  to  zero,  assumably  with  some  exponential  rate. 

First  we  study  the  asymptotic  behavior  of  Q*.  We  make  rigorous  our 
earlier  comment  that  Q*  exponentially  grows  or  decays  with  rate  depending 
upon  Koo  whenever  (C.l)  holds. 

Proposition  7.2..  Under  (C.l),  we  have  the  following  estimates:  The 
upper  bound 

ES«  j  In  A max(Q:'Q!)  <  2  In  p(Ii^)  (7.14a) 

always  holds.  The  lower  bound 

2  In  Amin(/i'co)  <  limt  yin  A  min(Q*t'Q*t)  (7.14  b) 

holds  provided  either  K is  singular  or  the  matrices  {Kp,  t  =  0,1,...}  are 
all  invertible. 

If  Kt  is  singular  for  some  T,  then  by  (7.12)  Q *  is  singular  for  all  t  >T. 
In  this  case  (7.141b)  cannot  hold  unless  is  also  singular.  This  is  the 
reason  for  the  two  assumptions  for  (7.14b).  In  the  analogous  continuous¬ 
time  calculations,  Q*  obeys  a  linear  differential  equation  in  Mn,  with  the 
net  result  that  <5*  will  be  invertible  for  all  t. 

Proof  of  Proposition  7.2.  First  we  develop  the  recursion  (7.12)  to  obtain 
Q*+i  —  KtKt-i  •  ■•Ks+\Q*si  »<t.  M  =  0,1,...  (7.15) 
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The  natural  basis  of  our  arguments  would  be  the  following  sequence  of 
steps: 

R  yin  A  mUQTQt)  =  2H mt  yln||Q*||op 

_  i  4 

<  2 limt  -  y^ln||/f3||op  (7-16) 

*  3=1 

=  2\n\\K00\\0p.  (7.17) 

The  first  equality  comes  from  (2.2)  while  (7.16)  follows  from  (7.15)  (with 
5  =  0)  with  the  help  of  standard  properties  of  the  operator  norm;  the  pas¬ 
sage  to  (7.17)  is  validated  by  invoking  (C.l),  the  continuity  of  the  operator 
norm  and  Cesaro  convergence.  Unfortunately,  (7.17)  is  short  of  the  desired 
result  (7.14a),  which  gives  a  tighter  bound  than  (7.17)  since  in  general 

p(Koo)  <  \\Koo\\ op  [31]. 

The  arguments  leading  to  (7.17)  must  therefore  be  modified.  The 
missing  step  is  to  be  found  in  a  well-known  fact  from  matrix  theory  [31,  p. 
271  and  Thm.  3.8,  p.  284]  stating  that 

limjv  (||A^||op)1/iV  =  p(Koo).  (7.18) 

This  suggests  tha,t  we  consider  the  evolution  of  Q*  at  time  instants 

{0,  N,2N, . . .}  for  each  N  =  1,2, -  To  do  this,  we  fix  an  integer 

N  =  1, 2, . . .  and  define  the  matrices  {k\N^;  t  =  0, 1, . . .}  by 

K\N)  :=  Kt+NKt+N- !  •  ■■Kt+1.  t  =  0,1,...  (7.19) 

Note  that  under  (C.l),  we  have  the  convergence  limt  =  K^.  By 

standard  properties  of  the  operator  norm,  we  see  from  (7.15)  and  (7.19) 
that 

.  _  j  i-1 

limj  7lnIIQjivl|op  <limi  ; 

=  ln||/^||oP,  (7-20) 

where  the  last  equality  (7.20)  is  obtained  by  Cesaro  convergence  since 

limt  || A [jy/ ||op  =  ||A^||0p. 
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To  pass  from  the  estimate  (7.20)  on  the  lattice  {0,  N,  2N,  ■  •  •}  to  a 
corresponding  estimate  on  {0, 1, . . we  define 

3N(t)  :=  [t/n\,  N  =  1,2,...  *  =  0,1,... 

where  [-J  is  the  integer  floor  function;  for  each  t  =  0,1,...,  ijv(*)  is  the 
unique  integer  such  that 

jN(t)N  <  t  <  (jN(t)  +  l)N,  N  =  1,2,....  (7.21) 

For  each  fixed  integer  N  —  1,2,...,  we  readily  conclude  from  (7.15)  (with 
s  =  jN(t)N)  that 

_ i  _  i  i 

lim*  -  In  ||Q*||0p  <  limj  -  |  In  ||iiffc||0?>j 

fe=jw(t)+i 

+  lto,  ^^1n||Q*„(1)w|UP.  (7.22) 

Using  (7.21),  we  observe  that 

7t_.S+]  S  I  (7.23) 


and 


Since  limt  jV(*)  =:  oo  monotonically,  it  is  now  apparent  from  (7.22)-(7.24) 
that 

Em*  In ||Q« EUp  --^WQInWop 

=  ^  In  lollop-  (7-25) 

where  the  last  equality  follows  from  (7.20). 

Now  letting  N  go  to  infinity  in  (7.25)  and  using  (7.18),  we  finally  get 
the  desired  estimate  (7.14a)  via  (2.2) 

The  proof  of  (7.14b)  is  similar  to  (7.14a)  if  all  the  matrices  {Kt]  t  = 
0,1,...}  are  invertible:  Indeed,  under  such  circumstance,  all  the  matrices 
in  the  sequence  Q*  are  invertible  and  we  can  write  the  following  recursion 

(Qo-1)'  =  in,  =  {K7l)\Qr1)'  t  =  o,i,...  (7-26) 
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for  the  inverse  transposed  matrices;  this  recursion  has  essentially  the  same 
form  as  the  recursion  (7.12)  for  the  sequence  Q*.  Therefore,  if  in  addition 
A'oo  is  invertible,  then  the  sequence  of  matrices  {(Af-1)7;  t  —  0,1,...} 
(which  plays  the  role  of  the  sequence  K  for  (7.26))  satisfies  condition  (C.l), 
i.e.,  it  has  a  limit  with  lim*  (A'(_1)'  =  (A^1)'.  Consequently,  the  arguments 
which  validated  (7.14a)  in  the  first  part  of  the  proof  apply  to  give 

}li|i(0r1)'ll^  <  21nK(JCS.1)') 

=  21 M-O 

=  21nA„,ln(jrco)-1  (7.27) 

where  the  last  two  equalities  readily  follow  from  well-known  facts  from 
matrix  theory  [31].  Next,  invoking  (2.2)  again,  we  also  have 

ii(<2r1)'iiop  =  Amax((Qr1KQr1y) 

=  Amax(  (QTQtr1) 

=  (A miniQtQVr1.  t  =  0,1,...  (7.28) 

Combining  (7.27)  and  (7.28),  we  obtain  that  (7.14b)  holds  when  the  matri¬ 
ces  {A't;  /  =  0,1,...}  and  A'oo  are  all  invertible.  In  the  other  case,  when 
A'oo  is  singular,  (7.14b)  holds  trivially.  This  covers  all  the  cases  in  the 
hypothesis  for  the  lower  bound  (7.14b).  jj§ 

Proposition  7.2  gives  part  of  the  asymptotic  analysis  suggested  by 
(7.11).  The  following  result  helps  complete  the  picture. 

Proposition  7.3.  For  every  distribution  F  in  V°(Rn),  we  have 

SUPt  IF(In,Rt )  <  '30. 

Proof.  Note  that  we  have  not  required  condition  (C.2)  to  hold.  In  light 
of  the  bound  (7.9),  it  is  sufficient  to  show  that 

Ilm* Jf(A*)  <  oo  (7.29) 

for  all  distributions  F  in  P°(En).  The  functional  JF  being  continuous  on 
Qn,  we  clearly  have  (7.29)  under  the  assumption  (C.2).  When  (C.2)  does 
not  hold  in  the  scalar  case  (i.e.,  n  —  1),  R*  is  a  scalar  sequence,  and  we 
may  carry  out  some  simple  calculations  on  (7.10)  to  show  (7.29).  Some 
analogous  but  more  complicated  manipulations  of  (7.10)  also  yield  (7.29) 
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in  the  multivariable  case  when  (C.2)  does  not  hold;  details  are  omitted  for 
the  sake  of  brevity.  ■ 

Combining  Proposition  7.3  with  (7.11)  and  the  estimate  (7.14a)  yields 
the  following  upper  bound. 

Theorem  7.4.  If  assumption  (C.l)  holds,  then 

limt  jln£t  <  21n/>(JCx>). 


The  analogous  asymptotic  lower  bound  is  made  a  bit  more  complicated 
by  the  possibility  that  we  may  have  limt  Ip(In,R*)  =  0.  In  the  next  section, 
we  prove  the  following  “converse”. 

Proposition  7.5.  If  assumption  (C.2)  holds,  then  limt  Ip(In,R *)  =  0 
implies  that  the  distribution  F  of  the  initial  condition  is  Gaussian. 
Collecting  this  result  and  the  lower  bound  of  (7.14b)  gives  the  following 
lower  bound. 

Theorem  7.6.  If  assumptions  (C.l)  and  (C.2)  hold,  then  for  every  non- 
Gaussian  initial  distribution  F  in  T>°(En),  we  have 

lim,  ~ln£(  —  21nAmin(Aco) 

if  either  is  singular  or  all  the  matrices  {Kp,  t  —  0, 1, . . .}  are  invertible. 
Recall  that  in  the  case  where  F  is  actually  Gaussian,  the  MMSE  and  LLSE 
estimators  agree  so  that  st  —  0  for  all  t  —  1,2, . . .. 

We  now  present  some  straightforward  implications  of  Theorems  7.4 
and  7.6. 

Theorem  7.7.  Assume  (C.l).  If  p^K^)  <  1,  then 

(a)  The  sequence  Q*  converges  to  zero ; 

(b)  The  sequence  R*  has  a  well-defined  limit  R*0;  and 

(c)  For  all  distributions  F  in  T>(En),  the  convergence  limt  et  =  0  takes 
place  at  least  exponentially  fast  according  to  Theorem  7.4. 

Proof.  Claim  (a)  is  almost  a  direct  consequence  of  (7.14a),  (2.3)  and  of 
the  equivalence  of  norms  on  Mn.  To  prove  Claim  (b),  we  should  note  via 
(7.13)  that 


m  -  k 


I  op 


< 


t-i 


,(£*) 


Eiw 


*112 
op  ’ 


s  <  t.  s,t  —  0, 1, . 
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Since  under  assumption  (C.l),  the  convergence  limt  ||Q*||op  =  0  is  expo¬ 
nentially  fast  when  p(K<Xl)  <  1,  the  sequence  R*  is  thus  Cauchy  and  hence 
convergent  in  A4n  under  the  conditions  of  the  theorem.  Claim  (c)  is  simply 
a  quantitative  interpretation  of  Theorem  7.4.  | 

Given  the  enormous  literature  on  the  discrete-time  Ricatti  equation, 
we  should  expect  that  there  are  some  well-known  conditions  under  which 
the  hypotheses  of  Theorem  7.7  hold.  We  conclude  this  section  with  these 
results: 

Theorem  7.8.  If  the  pair  (A,  II)  is  detectable,  then  (C.l)  holds.  If  in 
addition,  the  pair  (A,  C  )  is  stabilizable,  where 

A  :=  A  -  SWV(SV)~1II  and  C  :=  Zw  -  ■ZWV(ZV)-1ZVW, 

then  condition  (C.l)  holds  and  the  matrix  K0 0  is  asymptotically  stable, 
i.e.,  p(A'oo)  <  1. 

Proof.  The  first  claim  is  Theorem  5.2(b)  of  [4,  p.  172],  while  the  second 
claim  follows  from  Theorem  5.3  of  [4,  p.  175].  | 

It  is  worth  pointing  out  that  the  matrix  C  may  be  interpreted  as  the 
conditional  covariance  of  W°+1  given  for  each  t  =  0, 1, . . . ,  and  hence 
is  positive  semi-definite.  Essentially,  Theorem  7.8  gives  a  set  of  conditions 
under  which  the  mean-square  MMSE-LLSE  error  tends  to  zero.  Note 
however  that  the  hypotheses  do  not  imply  that  either  the  MMSE  or  LLSE 
estimators  provide  good  estimates  of  the  plant  process  X°.  Indeed,  to 
elaborate  somewhat  upon  this  point,  assume  the  stronger  hypotheses  that 
the  pair  ( A,H )  is  detectable  and  the  pair  (v^C1/2)  is  controllable.  Then 
it  is  a  classical  result  [5]  that  the  recursion  (1.7)  for  the  error  covariances 
associated  with  the  LLSE  estimates  has  a  well-defined  and  positive  definite 
limit  which  is  the  unique  solution  of  the  steady-state  Ricatti  equation 

P£  =  AP*A'  +  £w 

-  [AP*H'  +  T.WV][HP^H'  +  T,v]-l[AP^H'  +  EH'- 

Hence,  under  these  stronger  hypotheses,  we  conclude  that 

lim;  E°[||X£1  -  X°+1||2]  =  trace  (P£)  >  0.  (7.30) 

Moreover,  by  the  ‘orthogonality  principle’  for  MMSE  estimators  [2,  Sec. 
2.3],  we  readily  get  that 

E“0|X,+1  -  v,°+1||2]  =  E°[||X,+1  -  V,°+1||2]  +  £1+1.  (  =  0,1,...  (7.31) 
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Now  by  combining  (7.30)  and  the  results  of  Theorems  7.7  and  7.8,  we  see 
from  (7.31)  that  the  relation 

lim,  E°[||Xt+1  -  X°+1||2]  =  trace  (P£)  >  0 

also  holds.  Thus,  under  some  appropriate  assumptions,  the  performance  of 
the  LLSE  and  MMSE  estimators  are  (essentially)  equivalent  for  large  times, 
i.e.,  nothing  is  (asymptotically)  lost  by  using  the  simpler  LLSE  estimator 
rather  than  the  more  computationally  demanding  MMSE  estimator. 


VIII.  A  PARTIAL  CONVERSE 


This  rather  short  section  is  devoted  solely  to  the  proof  of  Proposition 
7.5.  Before  starting  the  discussion,  we  observe  from  (7.11)  that  a  sufficient 
condition  for  limtlF(I„ , R*)  —  0  is  that  Amj n(Q*'Qt)  >  0  for  aH  f  large 
enough  and 


limj 


st((A,ff,S),F)_ 
A  min(Qt'Qt) 


We  suggest  the  following  probabilistic  interpretation  for  this:  Whenever 
Amin (Q*'Q*)  >  0  for  all  t  large  enough,  if  e  decays  quickly  enough,  then 
£  must  in  fact  be  Gaussian,  in  which  case  Et  is  identically  zero  for  all 
t  —  1,2,....  This  not  only  provides  an  indirect  characterization  of  the 
initial  condition  as  a  Gaussian  or  non-Gaussian  rv,  but  also  may  be  viewed 
as  some  form  of  phase  transition  as  we  vary  the  initial  condition  between 
Gaussian  and  non-Gaussian  rvs. 

Proof  of  Proposition  7.5.  We  base  our  study  of  the  implications  of 
limt  /i?(/n,  R~t )  =  0  on  the  formula  (7.3). 

A  natural  first  step  is  to  try  to  interchange  the  integration  operations 
of  (7.3)  with  the  limiting  operation  of  letting  R *  tend  to  R^.  To  do  this, 
we  must  study  the  behavior  of  both  the  integrand  in  (7.3)  with  K  —  In  and 
R  =  R* ,  and  the  measures  {Gr*;  t  =  1, 2, . . . }  as  we  let  t  tend  to  infinity. 
We  consider  first  the  more  difficult  question  of  the  probability  measures 
{Gr+;  t  =  1,2, . . .}.  Assumption  (C.2),  which  is  that  the  sequence  R*  has 
a  well-defined  and  invertible  limit  72^,,  allows  us  to  control  the  behavior  of 
these  probability  measures  by  comparing  them  to  Lebesgue  measure  A  on 
Rn.  Since  the  set  of  invertible  n  X  n  real  matrices  is  an  open  subset  of  Mn 
[10,  Thm.  X.2.1],  assumption  (C.2)  implies  the  existence  of  a  finite  T  such 


48 


that  for  t  =  T,T  +  1,. . the  matrix  R*  is  invertible  and  the  probability 
measure  Gr*  is  thus  absolutely  continuous  with  respect  to  A  for  all  such 
t.  Fatou’s  lemma  and  the  condition  limf  !>(/„,  12 £)  =  0  then  immediately 
imply 


link 


ll/i»  {z  -  [Rt  +  7(2,  b^RfidF(z)f  dG 


r  (6;  Rt) 


dX 


(b)  =  0 


for  A-almost  all  b  in  Rn.  Under  (C.2),  we  find 


=  dCdlHb)  >  °’  bE  R"’ 


while  by  dominated  convergence  with  the  bound  (7.6),  we  also  have  that 


limtT(b;Rt)  =  T(b,R*oo)>0,  be  Rn. 


This  allows  us  to  focus  on  the  numerator  of  the  integrand  of  (7.3)  and  thus 
to  conclude  that  if  lira,  Ip{In^Rt)  =  0,  then 


lim. 


/  {2 -[12*  + A  *]  1b}-f(z,b]Rt)dF(z) 

Jit” 


=  0 


for  A-almost  all  b  in  Rn.  Appealing  once  more  to  dominated  convergence 
and  (7.6),  we  see  that 


Jr* 


z1(z,b]R*C0)dF(z)  =  [R*00  +  A"1] 


-1}-1*  /  7(^,6; 


R*oo)dF(z)  (8.1) 


for  A-almost  all  b  in  Rn.  This  equation  clearly  enforces  some  constraints 
upon  the  distribution  F — to  complete  the  proof  of  Proposition  7.5  we  shall 
show  that  in  fact  they  imply  that  F  is  Gaussian. 

The  implications  of  (8.1)  are  not  directly  obvious.  Things  become  a 
bit  clearer,  however,  upon  expanding  7 (z,  6;  Rt)',  we  get  that 


zexp 


=  [R 


*  +  A-ll~ 


H  J 

J  1" 


exp 


z'b  -  -z'Rtz 


dF(z) 


(8.2) 


for  A-almost  all  b  in  R”.  If  we  define  an  auxiliary  probability  measure  F  on 
Rn  which  is  absolutely  continuous  with  respect  to  F  via  its  Radon-Nikodym 
derivative 


dF'  ’ 


exp  [-^z'R^z] 


Jm»  exP  [~\z'R*ooz\  dF(zY 


z  e 


(8.3) 
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then  (8.2)  becomes 


I  zexp[z'b\dF(z)  =  [R^  +  A  x]  x6  f  exp[z'b]dF(z), 

J  in  J  i“ 

for  A-almost  all  b  in  Rn.  Rewriting  (8.1)  in  this  way  now  suggests  an 
analysis  in  terms  of  the  moment  generating  function  N  of  F,  where 

N(b)  :=  f  exp[z'b\dF(z)  =  [  l^^dF(z),  b  £  Rn.  (8.4) 
J  1»  J  1»  J-  vu>  noo) 

By  (7.6)  and  some  straightforward  technical  calculations  [27,  Sec.  VI.2], 
we  see  that  N  is  well  defined  and  differentiable  on  all  of  Rn,  with 


ViV(6)  =  / 


J(z,b;  R*o o) 


'*„zWi>£)-dnz)'  i,eR” 

Comparing  this  last  fact  with  (8.1)  and  (8.4),  we  find  that  N  satisfies  the 
differential  equation 

JV(0)  =  1,  VN(b)  =  +  A-1]-1  bN(b),  b  £  R71, 

and  by  the  uniqueness  of  solutions  of  ordinary  differential  equations  we  get 


N(b)  =  exp 


h'[Jc+A-,r1<’ 


b  E 


By  the  uniqueness  of  moment  generating  functions,  the  distribution  F  is 
now  completely  characterized — it  is  a  Gaussian  distribution  with  zero  mean 
and  covariance  R ^  +  A-1.  The  matrix  R ^  +  A-1  being  positive  definite, 
this  implies  that  F  is  absolutely  continuous  with  respect  to  A,  with  Radon- 
Nikodym  derivative 


dF(v)  _  expt-ly^  +  A-1]^ 
d\[Z>  (27r)"/Vdet  [(£*,  +  A-1)”1]’ 


(8.5) 


To  complete  the  proof  of  the  proposition,  we  now  must  show  that  since 
F  is  Gaussian,  so  is  the  original  distribution  F.  An  obvious  implication  of 
(8.3)  and  of  the  absolute  continuity  of  F  with  respect  to  A  is  that  F  is  also 
absolutely  continuous  with  respect  to  A.  In  fact,  combining  (8.3)  and  (8.5), 
we  get  the  formula 


dF !  \  dF  t  \  dF  t  \ 

T\{z)  =  Tf[z)  '  T\{z) 


c  ■  exp 


2  €  K71, 


for  some  positive  constant  c.  This  completes  our  proof — F  is  Gaussian.  | 
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IX.  THE  SCALAR  CASE 


In  this  section  we  continue  the  analysis  of  Section  VII,  here  focusing 
exclusively  on  the  scalar  case  n  =  k  =  1.  We  shall  see  that  a  number  of 
simplifications  of  (5.21),  (6.5),  (6.6)  and  (7.3)  are  now  possible,  allowing  a 
much  more  complete  study  of  the  asymptotics  of  e. 

To  emphasize  the  scalar  nature  of  our  calculations  and  to  conform 
to  standard  notation,  we  shall  use  lower  case  letters  for  all  of  our  scalar 
quantities.  Thus 


with  av  >  0,  the  system  triple  now  is  (a,  h,  E)  and  the  variance  of  the  initial 
condition  is  6.  Proposition  7.8  suggests  that  the  quantities 

*WVh  J  _  „  (<TWV)2 

av  av 

will  play  a  fundamental  role  in  our  ensuing  analysis.  We  can  more  immedi¬ 
ately  see  the  utility  of  these  quantities  in  the  scalar  case  by  rewriting  (5.21), 
(6.5)  and  (6.6)  in  the  forms 


a2avpt 

PO-0,  Pt+l-  h2pt  +  (TV+C> 

t  =  o,i,... 

(9.1) 

9c  =  1)  9*+i  —  ktqj 

<  =  0,1,... 

(9.2) 

with 

<  =  o,i,... 

(9.3) 

and 

r*  nr*  r*  ,  (?i*)2ft2 

r°  =  0'  r‘»=r‘  + hlp.+v*- 

<  =  o,i,... 

(9.4) 

We  also  have  in  the  scalar  case  the  advantage  that  the  upper  and  lower 
bounds  of  (7.11)  collapse  into  the  equality 

£t  = 

<  =  1,2,... 

(9.5) 

which  holds  for  all  F  in  X>°(R). 

A  natural  way  to  organize  our  efforts  here  is  to  taxonomize  the  different 
possible  cases  based  upon  Proposition  7.8 — that  is,  upon  the  detectability 
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of  (a,  h)  and  the  stabilizability  of  (d,^1/2).  A  slightly  more  direct  classi¬ 
fication  scheme  of  the  scalar  problem,  however,  will  emerge,  yielding  four 
possibilities  parametrized  by  h,  a  and  c. 


Our  first  case  is  an  obvious  degeneracy: 


Proposition  9.1.  If  either  a  =  0  or  h  =  0,  then  et  =  0  for  all  t  =  1,2, . . . 
and  all  distributions  F  in  T>(R). 

Proof.  Fix  F  in  X>°(R).  If  a  —  0,  then  q*  =  0  for  all  t  =  1,2, ...  by 

(9.2)  and  (9.3),  and  therefore  (9.5)  implies  £t  =  0  for  all  t  =  1,2, _  On 

the  other  hand,  if  h  =  0,  then  (9.4)  yields  r*  =  0  for  all  t  =  0,1,...,  so 
again  et  =  0  for  all  t  =  1,2, . . .,  this  time  by  direct  evaluation  of  (7.3).  We 
translate  these  results  from  X>0(R)  to  Z>(R)  by  making  use  of  the  translation 
arguments  of  (7.1)-(7.2).  g 


We  now  consider  the  more  interesting  situation  where  both  a  ^  0  and 
h  0,  in  which  case  q*  0,  r*  >  0  and  £*  >  0  for  all  t  =  1,2, . ...  We 
rewrite  (9.1)  as  pt+i  =  T(pt )  where  the  mapping  T  :  [0,  oo)  — >•  R  is  given 

by 


-2  v 

\  a  a  p 

T(P)  '■=  19  —  P  ^  0- 


h2p  +  ov 


(9.6) 


Note  that  since 


T'(P) 


a2(ov)2 
( h2p  +  ou)2  ’ 


P  >  0, 


the  mapping  T  is  concave  and  nondecreasing  on  [0,  cx)).  Hence,  the  iterates 
{pt,  <  =  0,1,...}  form  a  nondecreasing  and  thus  convergent  sequence  with 
limit  point  p^  in  [0,  oo).  The  finiteness  of  p^  is  an  easy  consequence  of  the 
relation  p0 0  =  T(p(X>),  which  must  necessarily  hold. 

Consequently,  the  sequence  k  has  a  limit  k0 a  given  by 


u  _  ao 
°°  :=  h2Poo  +  a- 


with  \kool  >  0  since  a  /  0  and  <  oo.  By  Cesaro  convergence,  this 
implies  via  (9.2)  that 

lim<7  ln(??)2  =  21n|fcoc|. 

It  is  then  easy  to  see  from  (7.13)  that  if  | Ajqc j  <  1,  then  :=  limj  r f 
is  well  defined  and  finite,  whereas  if  l&ool  >  1,  then  lim(  r*  =  oo.  These 
observations  give  us  the  second  case  of  interest: 
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Proposition  9.2.  We  assume  both  h  ^  0  and  a  0.  If  either  c  ^  0  or  c  =  0 
with  |d|  <  1,  then  |^oo|  <  1  andlim*  et  =  0  with  lim4  Alns*  =  21n|&oo|  <  0 
for  all  non-Gaussian  distributions  F  in  V(R). 

Proof.  Prompted  by  the  remarks  made  earlier,  we  begin  by  showing  that 
|feoo|  <  1  under  the  stated  conditions.  If  c  =  0,  then  pt  —  0  for  all  t  — 
0,1,...,  so  that  poo  =  0  and  the  conclusion  [k^l  <  |d|  <  1  follows  when 
|d|  <1.  If  c  /  0,  then  necessarily  c  >  0  by  the  first  remark  following 
Theorem  7.8  and  therefore  Poo  >  0  (since  c  =  pi  <  pco).  Consequently,  p^ 
is  the  only  finite  solution  to  the  fixed  point  equation  T(p)  =  p  on  (0,oo), 
and  geometric  considerations  based  on  the  concavity  and  monotonicity  of 
T  readily  lead  to  T'(p <*,)  <  1.  The  conclusion  Ik^l  <  1  now  follows  from 
the  relation  T'(p00)  =  kr^. 

As  pointed  out  earlier,  if  both  h  ^  0  and  a  ^  0,  then  q*  ^  0  and 
r*  >  0  for  all  t  =  0,1,...,  whence  >  0  since  {r*,  t  =  0,1,...}  is 
an  increasing  sequence.  On  the  other  hand,  we  saw  earlier  that  <  1 
implies  <  oo.  Therefore,  from  Propositions  7.3  and  7.5,  we  obtain 
0  <  lirn,  IF(l,r*)  <  limt  Jp(l,r*)  <  oo  for  every  non-Gaussian  F  in 
H°(R).  As  a  result,  limt  —  lim<  jln(g*)2  =  2 In | Atqo |  <  0  for  all  F 

non-Gaussian  in  X>°(M),  and  thus  in  1>(R)  by  translation.  | 

Notice  that  Proposition  9.2  is  almost  a  direct  consequence  of  Theorems  7.4 
and  7.6  since  in  the  scalar  case,  we  have  Am;n(A:0O)  =  p(^oo)  =  | j ,  and  we 
thus  would  need  only  to  establish  that  conditions  (C.l)  and  (C.2)  hold 
true  under  the  assumptions  of  Proposition  9.2.  However,  we  found  the  more 
direct  argument  involving  (9.6)  to  be  an  interesting  calculation  tailored  to 
the  scalar  case. 

It  now  remains  to  investigate  the  case  c  =  0  and  |a|  >  1,  still  with 
h  ^  0.  We  shall  see  that  in  this  case  the  initial  state  distribution  F  has 
a  nontrivial  effect  on  the  large  time  asymptotics  of  e.  A  priori,  it  would 
seem  natural  that  the  initial  distribution  F  should  have  some  effect  on 
the  asymptotics  of  the  mean  squared  error  between  the  MMSE  and  LLSE 
filters.  However,  in  both  cases  considered  thus  far  in  Propositions  9.1  and 
9.2,  the  effect  of  the  system  parameters  (a,  h,  E)  have  dominated  these 
asymptotics.  Only  when  c  =  0  and  |a|  >  1,  does  F  hatve  a  significant  effect. 
We  shall  establish  this  dependence  by  giving  a  complete  analysis  for  two 
specific  initial  distributions  F,  and  by  noting  the  different  asymptotics  of 
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£.  We  first  verify  a  general  result  which  complements  Proposition  7.5. 

Lemma  9.3.  For  any  distribution  F  in  T>(R),  we  have  //.-(I,  r)  <  £  for  all 
r  >  0  so  that  limr  Ip{  1,  r)  =  0. 

Proof.  Since  the  functional  7^(1,  •)  is  independent  of  the  system  triple 
(a,h,  E),  we  can  study  it  by  choosing  the  system  (l.la)-(l.lb)  at  our  con¬ 
venience.  In  particular,  we  shall  take  the  system  (l.la)-(l.lb)  to  be 

x°  =  t,  Vt  =  C  +  v°+1,  t  =  0,1,... 

i.e.,  a  =  h  =  1  and  aw  =  owv  =  0,  in  which  case  =  1,  r*  =  i/cr"  and 
£t  =  7^(1, //<tu)  for  all  /  =  0,1,....  We  now  define  a  process  x  =  {it;  t  = 
1, 2, . . .}  of  linear  estimates  of  the  process  x°  on  the  basis  of  y  as 

it+1  :=  7+1^  Vs‘  t  =  o,i,’" 

5=0 

Invoking  (7.31),  we  get  that 

E°[|i<+1  -  4U12] 

<E°[|x(+i-x:+1|2],  1  =  0,1,...  (9.7) 

where  the  last  inequality  follows  from  the  minimizing  definition  of  the  LLSE 
estimator  and  the  fact  that  x  is  a  sequence  of  linear  estimates.  Therefore, 
we  conclude  that 

and  the  result  now  follows  since  av  is  arbitrary.  I 

Whereas  Lemma  9.3  gives  us  a  uniform  upper  bound  similar  to  that 
of  Proposition  7.5,  our  real  interest  in  the  next  case  is  to  show  that  if  the 
system  triple  (a,  h,  £)  is  of  a  specific  form,  then  the  asymptotics  of  e  are 
not  uniform  over  all  non-Gaussian  initial  distributions  F — the  asymptotics 
of  £  depend  nonirivially  on  F.  We  shall  use  the  following  two  types  of 
non-Gaussian  distributions  as  examples  of  this: 


t-i 


5=0 


o 

5+1 


t  =  1,2. 
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Distribution  Fi:  Distribution  F\  admits  a  density  with  respect  to 
Lebesgue  measure  A  on  R  of  the  form 


dFi 

~d\ 


w  =  E 


i 

a,— . - 

y/2irp2 


exp 


1  (*-a*»)2 

2  p2 


z  £  R, 


where  p  >  0,  0  <  07  <  1  for  i  =  1,2,...,  to,  Y!/iLiai  =  1)  an(l 
a»^»  —  0-  We  exclude  the  trivial  case  where  F\  is  Gaussian. 

Distribution  F2:  Under  F2,  the  rv  £  takes  on  a  finite  number  of  values 
Z\  <  Z2  . . .  <  zm  with  strictly  positive  probabilities  pi,p2, . . .  ,pm,  re¬ 
spectively,  such  that  Pizi  —  0- 


Distributions  of  the  type  F\  have  been  considered  before  in  the  context  of 
filtering  theory  [1],  [2,  Sec.  8.4],  [26].  The  following  two  important  facts 
about  F\  and  J P2  are  proved  in  [27]: 


Fact  1.  We  have 


/f,(  hr) 


K  +  o(  1) 

(p2r+  l)2’ 


r  >  0, 


for  some  K  >  0. 


Fact  2.  We  also  have 


IF2(l,r) 


l  +  o(l) 

r 


r  >  0. 


We  now  can  prove  the  following  results,  which  concern  the  third  case: 

Proposition  9.4.  If  h  ^  0,  |a|  =  1  and  c  =  0,  then  limt  et  —  0  for  any 
distribution  F  in  'D(R),  with  limt  7  In  st  <  0.  This  convergence  takes  place 
at  a  rate  which  depends  nontrivially  upon  F  for  non-Gaussian  F. 

Proof.  Under  the  stated  hypothesis,  we  have  pt  —  0,  ( q *)2  =  1,  r*  = 
h2tlov  and  £t  =  Ip(l,h2t/ov)  for  all  t  =  0,1,...  and  all  F  in  Z>°(R), 
the  extension  to  'D(R)  being  as  before.  The  conclusions  limt  et  =  0  and 
limt  7  ln£t  <  0  are  immediate  consequences  of  Lemma  9.3.  However,  direct 
calculations  show  that  if  the  initial  distribution  is  then  limt  t2et  = 
K(crv)2 /(p4/i4),  whereas  if  the  initial  distribution  is  F2,  then  limt  tet  = 
ov / h2  (thus  limt  ylnet  =  0  in  both  cases.) 
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And  finally,  the  fourth  case,  which  like  Proposition  9.4  displays  a  non¬ 
trivial  dependence  on  the  initial  distribution,  is 

Proposition  9.5.  If  h  ^  0,  |a|  >  1  and  c  =  0,  then  limj  et  <  oo  for  all 
distributions  F  in  Z>(R),  the  asymptotic  behavior  depending  nontrivially 
upon  F  for  non-Gaussian  F. 

Proof.  Under  the  stated  hypotheses  on  (a,h,  S),  pt  =  0,  (g*)2  =  a2t , 
r*  =  hi  ^-1  f°r  ^  t  =  0, 1, . . ..  Thus  limt  r*  =  oo  with  limt  (g*)2 /rf  = 
ov{a 2  —  1  )jh2  and  we  are  led  to  write 


(<tf)2 


z2t 


a2t  —  1  ’ 


t  =  1,2,...  (9.8) 


where  the  inequality  follows  from  Lemma  9.3.  We  now  see  that  limt  et  <  oo 
for  all  F  in  2?°(R),  and  thus  for  all  distributions  F  in  D(R).  However,  if  £ 
has  distribution  Pi,  then  linij  et  =  0,  whereas  if  £  has  distribution  F2,  then 
limt  £t  —  crv(d2  -  l)//i2.  | 

This  essentially  completes  our  analysis  of  the  scalar  case,  as  all  possible 
combinations  of  a,  c  and  h  have  now  been  considered. 

A  review  of  the  arguments  thus  far  reveals  that  a  large  portion  of 
our  calculations  relied  upon  scalar  manipulations  which  are  not  readily 
extendable  to  the  multivariable  case.  Several  remarks,  however,  do  give  us 
some  information  and  intuition  concerning  the  multivariable  case.  First, 
from  (7.31)  we  see  by  an  argument  similar  to  the  one  leading  to  (9.7)  that 

£t<  E°[|x°-xf|2]  =  pf  t  =  1,2,... 

where  the  error  variances  {pf ,  t  =  0,1,...}  are  generated  through  the 
recursion  (9.1)  (which  is  nothing  but  (1.7))  with  initial  condition  pff  =  6. 
The  sequence  {py ,  t  =  0, 1, . . . }  is  either  monotone  nondecreasing  or  mono¬ 
tone  nonincreasirtg,  and  thus  convergent,  with  limit  point  p Therefore, 
whenever  p ^  <  00,  we  conclude  by  inspection  that 

st  <  max{<5,p£}.  <  =  1,2,...  (9.9) 

In  particular,  under  the  conditions  of  Proposition  9.5,  i.e.,  h  ^  0,  |a|  >  1 
and  c  =  0,  we  have  (9.9)  with  p ^  =  ov(a2  —  1  )/h2  (a  fact  which  is  of  course 
in  agreement  with  (9.8)).  Analogous  calculations  using  (7.31)  can  be  made 
in  the  multivariable  case  to  yield  a  bound  corresponding  to  (9.9). 
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A  second  comment  on  the  multivariable  case  is  the  following.  Our 
analysis  suggests  the  following  classification:  For  any  matrices  A  and  C  in 
Ain,  the  pair  (A,  C )  is  said  to  be  marginally  stabilizable  if  all  modes  which 
are  neither  stable  nor  critically  stable,  are  in  the  controllable  subspace. 
With  this  notion,  we  can  now  rewrite  the  results  of  this  section  in  terms 
which  are  also  meaningful  for  the  multivariable  case.  As  such,  this  formu¬ 
lation  provides  a  useful  starting  point  for  investigating  the  asymptotics  in 
the  nonscalar  case. 

Theorem  9.6.  We  have  the  following  convergence  results: 

la.  If  the  pair  (a,  c)  is  marginally  stabilizable,  then  liim  £<  =  0  for 
any  distribution  F  in  T>(R);  and 

lb.  If  the  pair  (a,  c)  is  not  marginally  stabilizable,  then  the  asymptotic 
behavior  of  £  depends  nontrivially  upon  F  in  V(R). 

Moreover  we  also  have  the  following: 

2a.  If  (a,  c)  is  stabilizable,  then  lim*  £t  =  0  at  an  exponential  rate 
independent  of  F  for  non-Gaussian  F  in  T>(R);  and 

2b.  If  (a,  c)  is  marginally  stabilizable  but  not  stabilizable,  then  the 
rate  depends  nontrivially  upon  F. 
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